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Industrial Applicability Claims 1-20 



Claims 
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2. Citations and Explanations 

1 ) . The present international application relates to Limb-Girdle Muscular Dystrophy 
Type 2A (LGMD2A). LGMD2A is strongly correlated to mutations in the gene of the proteolytic 
enzyme calpain 3 {CANP3). 

The genomic organization of the human CA/VP3 gene was determined. 
The above mentioned enzyme may also be referred to as 'p94' or 'nCL-1 '. 

2) . The following document has been considered for the purposes of this report: 
D1 THE JOURNAL OF BIOLOGICAL CHEMISTRY, Vol. 264. pp. 20106-201 1 1 ,1989 

3) . The present application does not satisfy the criterion set forth in Article 33(2) PCT 
because the subject-matter of claims 1-8 is not new in respect of prior art as defined in the 
regulations (Rule 64(1 )-(3) PCT). 

Document D1 discloses the nucleotide sequence of the human CANP3Qene and the deduced 
amino acid sequence (01 , Fig. 2). 

In view of the above comment, D1 is novelty destroying for claims 1-8 (Art. 33 (2) PCT). 
The involvement of the CANP3 gene in the etiology of LGMD2 is an implicit feature of the 
above sequences, and does not render novel the claimed subject-matter. 

4) . The methods of claims 1 2-1 4 are novel (Art. 33 (2) PCT), however they do not 
involve an inventive step (Art. 33 (3) PCT) for the following reasons. 
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For the person skilled in the art, being aware of the known amino acid sequences of claims 5 to 
7 or of the known nucleic acid sequences of claims 1 -4. the carrying out of said methods 
required nothing out of the ordinary and thus involved no inventive skill, all being a matter of 
technical convenience. In addition, this was a matter of normal design procedures for which 
neither 'creative thinking' nor 'inventive talent' were necessary. 

5). Claims 9-1 1 ,and 1 5-20 do meet the requirements of Art. 33 (2) and (3) for the 

following reasons: 

Before the priority date of the present international application, it was neither disclosed nor 
suggested that the CANP3 gene product, when mutated, is involved in the LGMD2 disease. 

In consequence, the use of sequences related to the CANP3 gene in the diagnostic of the 
LGMD2 disease or their use in the preparation of a pharmaceutical composition for the 
treatment of said disease could not be deduced in an obvious manner from the closest prior art 
document D1. 

The industrial applicability of the present set of claims is acknowledged (Art. 33 (4) PCX). 
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VIII. Certain observations on the international application 

Ihe following observations on the clarity of the claims, description, and drawings or on the question whether the claims are fully supported by 
the description, are made: 

-Claim 6 should read .according to claim 5, 

-Claim 18 should read *...by nucleic acid amplification../ 
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LGMD gene coding f„. , oalci« <,epe™.ent protease ' 

protease belongrng ,o ,he Calpam fa^il, which, when I, is muta.ed is a Ise^ 
3 a d,sease called Limb-Girdle Muscular Dystrophy (LGMD). ' 
The term limb-girdle muscular dvstroDhx/ n r-K>inx 
Walton and Nattrass (1954) as part TsTL T 

I rKAn • u ^ as part of a classrficatfon of muscular dystrophies 

LGMD ,s Characterised by progressive symmetrical atrophy and vveaknerrof h 
proximal limb muscles and by elevated .m . ^^kness of the 

, H« . elevated serum creatine kinase. Muscle bioosiP*: 

■ d^mohstrate dystrophic lesions and eleCron,yo,ra.s show myopa^ Z 
T^e sy.p.o.s usually t^gin during ,he «rs, two decades of life and .he le se 
gradually worsens, often resulting in loss of walking ability 10 or 20 yearaw 
onset (Bushby 1994^ Vpt ♦k^ « • ' ^ or years after 

. ■ P'®'''^® nosological definition of LGMD still 

^prrereX-irrdT— 

~ ..phies haye .en ^ ytZl 7 ur thlir Ls^T: 

in unde^ ' ''~™'>«''^ ^-e Issues highlight difn^ 

an analysis of the .olecular and genetic defec.,s, invoLed in thll 

Attempts to identify the genetic l«sis of this disease go back over 35 

t" rr r '^r ~ — - -"oZs 

.s 16 per thousand persons". The same authors also stated that -Ihe 
eg-.ation analysis gives no evidence on whether these ge^ n dlrl 
^.n ,es are allelic or a. d^eren. loci". Both autosomal dominant an recTZ 
transmission have been reoorted tho t^n u • recessive 

estlm.t.H . T ' ^^'"^ """'^ ^^^'^^^ v^ith an 

estimated prevalence of 10-5 fEmerv iqqi\ tk^ . .• 

^ '^^^)- 'he loca isat on of a opne *rir o 

ei ai., 1992, Passos-Bueno et al 19^-^^ tho 
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gene ,o chromosome 2 (LGMD2B, M,M 25360,: Bashir e, a,.. 1994, m.^e ,s 
ewdence that at leas, one other locus can be involved 

Firs, ce'net °' ^'^'^^ --P-^^- 'in^ngs 

, 2^'"^'"^"'°'"""'' .'emons.rated In ,he highly Inhred ,nd,ana Am.sh 

Zsr '^'^ "^""'^ •^o'^. ™ o 

represen. a gene.lc lso,a.e. a. leas, 6 dWeren. disease haplo.ypes we e 
Observed , providing evidence agains. ,he hypothesis of a single founder eZ 
(Beckmanne.al., 1991) in .his inbred populaHon. ""Ser effect 

The nonspecific nosological definition, the relatively low prevalence and 

.here ,s n„ . ' '° "^-^""^'^^^ ^"V ^'eficiency. In addrtion 

^eZrrarr ^^^^^^ - — - - 

s.ra.egy. ' ^ " *° ^ P«W°nal cloning 

It is eslablisheo tha, .he LGMD2 chromosomal region is localized on 
chromosome 15 as 15015 1 iii^'>i ■, localized on 

rnn,. . ^ *'"^*''2^ '^«9ion(Fougeroussee.al.,i994) 

;£err:~^^^^^^ 

^^^^^^^^^^^^ -^.^d a 
scree^tgrr;^ °' ^ — - - - 

y y cuixA selection (Lovett et al.. 1991 , Tagle et al 100-?^ f„ 
expressed sequences encoded by this interval led to^e iden,r " " 

subunit 1,' whi^ .:r:pr::::~ ^^^^^^ 
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Ca,pa,ns are non-lysosomal Intracellular cysteine proteases wh.ch regu.re 
calcu. for ,.eir catalytic activities (for a review see Croall D.E^ e, a, ,991 T e 

as tissue-spec,„c prote.ns. In adoition to the n,uscle specific nCL1 sto^act 

' ZZ "^'^ tnese a. "1:, 

rom he same gene Py alternative splicing. The ubiquitous enzy,r»s consis o 

2:: it!::;:: iro"" -^-^'-^ ---- 

are potent,a, ca J. 1"::^ r ~ "-^'^^ 

Known homology are presJn inThe "^""^ "° 

IS1 ana IS2 th. ,r -""scle-specific nCL1 protein, namely NS 

S1 and IS2, the latter containing a nuclear translocation signal These reoion» 
-ay be .mportant for the muscle specific funCon of nCLl ' 

or ^^r:::::!::::::^:::::::' - w. e^ess 

are the use of J. approaches for curing these diseases 

35930: orl 52S~ '^^^ ' ~ ^ ^ 

w.ch is e.presser::~r ^'^ ~ '^^ ^^'^^ °' ^ ~ 

The invention relates to the nucleic acid sequence surh = 
Figure 2 coding for a Ca" n^r, - sequence such as represented in 

LGMD2 disease and mor " "^'^ '"-'-^ - 

sequence p oTded is alT"? " '^'"^^ '° ^ °' «^ 

protease acti: n ^ ^ ' """^ ^ -'—pendent 

---encesbysups;:.rdL:o?a:;rof:r °' 

provided that said sequence is still , nucleotides 

........... rn:;' ~„-.r-« -» 
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The genomic organisation of .he human nCL1 gene has been de,e™,nea bv 
.he .nvemors, and oons.s.s of 2. exons and extends over 40 Kb as represented 

Tee?::: " ' ' °' 3^ - °' «s Z ha e 

. =rof~r:— ^-^^^^^^ 

independent mutational events in nCLI »r» °' 
c-K ^"^^ responsible for LGIWDPa 

• ~ 

15. ^ "^"-^ 9®"® on chromosome 

a..r:o:::e::errhr ri" ^ ^"^^ '^^'^^^^ - - 

-nmutated^ntheloToVra:^^^^^^^^^ ^"^^ ^ 
The cDNA of the gene coding for CANP3 whir^ w 

also represented in Fiaur. 9 . w """"^'"^ P^°^®*'^. 

ented rn Figure 2, and ,s a part of the invention 

.elon;:;.:;:-::::: a ca,cium.dependen. protease 

.er,vr,r::;::r;::;::n --^ — 

~. or b. mutatlo ^To 3 r^ ™^' 

P-ided that the translated prol has the o' 7"'°"' " '"''^ 

acvit. and .hen mutated, .nduoe .^.O^^Zr^''' — — 

The inven,,on als rl / ra '7^""^""' "''"^^ ^ 
-.uenee of .he Invention .nj JZ^r:! ' 

expression ofthecalpaln in an approp:!::::!; ^ ~ '"^"^ '^^ 

invention. sequence of Figure 2 is a part of the 

Such a host ceil might be either : 
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- a cell which is able to secrpt«= th^. 

.e .se. as . . J ~ ^ ' ^^^^ 

- a packaging cell line transfected by a viral or r«tr^ . 

lines bearing recombinant vector might be use 1 ! ' 
5 LGMD2. ^ ^'■"9 ^^-^ 9^"^ therapy of 

are included herein by reference °' ''^ 

LGMD was established " '^^ "'e 

Legend of the figures 
? Figure 1 

A) Genomic organisation of the nCLI gene 

"-:rd:;rZnran: rs t^- '---^ 

-^icated by numbered ven,ca, baTs Thl , " ^^'"^ 
regains ,o be fully seauenced P^ r 

by as.ens.s^ ArroL ndra" 1 ! 'T ^"'"''"^^ ™crosa,e,li.es are indica.ed 
-pea. sequences. (greyed, 
B) £coRI restriction map 

bar. The si^e of the correspond.nn ^= " 

wben determ,ned by se Jenr^ylr* ^"^ 
C) Cosmid map of the nCLl gene region 

-o/ZlTprraLr:: ~ subcon.. V.C 

— posit,. SXS^s ^ndttlTbred^^ ^'^ " ^ 

cosm.ds cover the entire gene. T3.T7 ^^"S'^s)- A minimum of three 



wo 96/16175 

PCT/EP95/04575 

Eiaure^ Sequence of the human nCLl cDNA mi «r,H . 

/ri™„. ■ 3"" «ie flanking 5' (Al and 1- 

(C) genomic regions. ^ 

A) and C) The polyadenylalion signal and putative CAAT Tataa . 
boxed. Putative Spi (position ^77 to ^72) MEF2 bin^ ' 

5 and CArG box (-685 to Ji7,, 1 ^ •° 

region is underlie ' '^^"^"^ — " 5' 

B) The corresponding amino acids are shown below the seouence Th h 
sequence between the ATG .nltiation codon and the TGA stn h ' 
encodlna for a . ^'°P ">^°n 's 2466 bp 

ncoaing for a 821 amino aad protein. The adenine In the fir« 
10 has been assigned position 1 i n.=,- . methionine codon 

position 1. Locations of introns within the nC.i 1 
indicated by arrowheads. Nucleotides which differ from ,h» ^ 
ones are indicated by asterisks """'"'^ 
fia^^ A^gnments of amino acid sequences of the musCe-speclfic calpalns 
' sequent: r^'nl :T " °" ^ ^ ^^-Ic 

- sequent aI: ™ r ranT/r '° 
-no acid sequences -odldTy r nd Ze rprrsl^'r^^^^'' 

a^racrrru~a"° - ::r:: 

-ipains are ,n reter rs 1 rT"/™"' ""^ '^"^^'^ '"^ 
present ,n the sequenc T^er^ T oT" ^^'^^ 
homologous sequence Positiln o 

-ove the mutated alo a:: """^"^^ ^^^^ ^ 

fiay^ Distribution of the mutations along nCL1 protein structure 

=o.es:or:::::r - - ~ to the 

mutations within nCLI dom« ^ °' "^'^^^"se 

'inin nuLl domain are indicated bv black nr.*. 
nonsense and frameshift rr. , . ®^ect of 

Trameshift mutations are illustrated ac ♦ 
representing the extent of nro. "'ustrated as truncated lines, 

y me extent of protein synthesised Name of tho ^ 
families are indicated on the left of fh. . -r. corresponding 
hatched lines. °' ^^^^ is given by 
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EiaureS: Northern blot hybridisation of a nCLI clone 

A mRNA blot (Clontech) containing 2 yg of poly(A). RNA from each of 
e^ht human tissues was hybridised with a nCL1 genomic clone spanning exons 
20 and 21. The latter detects a 3.6 kb mRNA present only in a line 
5 corresponding to the skeletal muscle mRNA. 

FiaureS: Representative mutations identified by heteroduplex analysis 

Examples of mutation screening by heteroduplex analysis. Pedigree B505 
shows the segregation of two different mutations in exon 22. 
^- Homozygous mutations in the nCL1 gene 
' -''-ncina Of mutmions in exons 2 ,a,, 8 13 ,c, an. 22 

W Sequences ,ro. a neal.hy con.ro, are shown above each .u,an. sequence 
A^ensKs ,nd,ca.e ,ne position o,.he stated nuCeo.ides. The consequences on 
codcn 3„. an,™ eci. residues are indicated on .he ,ef. o, .he Je .ogle 
With the name of the family. '"Seiner 
Eiayree .- structure of nCLI gene 

Figure 8A represents the 5' part of the gene with exon 1 
F.gure 8B represents the part of the gene including exons 2 to 8 
Figure 8C represents the part of the gene including exon 9 
Figure 8D represents the part of the gene including exons 10 to 24 
.ncluding the 3- non transcribed region. 10 to 24 

EXAMPLgR 

EXAMPi F 1 

Localisation of the nCL1 within the LGMD2A interval 

Detailed genetic and physical maps of the LGMDPA ro. 
constructed (Fougerousse et al iqq^w '-GMD2A region were 

to 15, (Bee mann e a 199 ' ^ ' ^^'^^^ ""^^^^ -ignment 
D15S129 and D15S143 the 
LGMD2A '''""^ ^°-clanes of the 

LGMD2A region as 15q15.1-15q21.1 (Fougerousse etal 1994) Con J. 
and analysis of a 10-12 Mb VAC contig (Fougerousse et li 1994. " 
to map 33 polymorphic markers within his nter^ an ^ furth 
LGMD2A region to between D15S514 and D15S222 
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15 



20 



The nCLI gene had been localised to chromosome 15 by hybridisation with 
sorted chromosomes and by Southern hybridisation to DNA from human-mouse 
cell hybrids (Ohno et al.. 1989).cDNA capture using YACs from the LGMD2A 
interval allowed the identification of thirteen positional candidate genes. nCLI 
was one of the two transcripts identified that showed muscle-specific expression 
as evidenced by northen blot analysis.The localisation was further confirmed by 
STS (for Sequence Tagged Site) assays. Primers used for the localisation of the 
nCLI gene are P94in2. P94in13 and pcr6a3. as shown in Figure 1 and their 
characteristics being defined in Table 1. 
labjejL PGR pnmers used for localisation of the nCL1 gene. 

Primer name Primer sequence (5 -3') 



PGR produa size on 



P94ml3 



P94-6a3 



P94exlter 



CL/fMA 

ATGGAUCCAACAGAACTGA 341-360 

£^ 428-448 

GTATGACTCGGAAAAGAAG 
GT 

TAAGCAAAAGCAGTCCCCA 1893-1912 

TTGCTGTTCCTCACTTTCCT ^'^^''^^^ 
G 

G7TTCATCTGCTGCTTCGTT 2342-2361 

CTGGTTCAGGCATACATGG 2452-2471 

TTCTTTATGTGGACCCTGAG 218-239 

^ 275-293 
ACGAACTGGATGGGGAACT 



58 



56 



55 



64 



130 



76 



1043 



818 



76 



These pnmers are designed from different parts of the published human 
CDNA sequence (Sonmachi et al.. 1989). and were used for an STS content 
screening on DNA from three chromosome 15 somatic cell hybrids and YACs 
from the LGMD2A confg. The results positioned the gene in a region previously 
defined as 15q15.1-q21.1 and on 3 YACs (774G4. 926G10. 923G7) localised in 
this region. The relative positions of STSs along the LGMD2A contig allowed to 
localise the gene between D15S512 and D15S488. in a candidate region 
suggested by linkage disequilibrium studies. 

The same pnmers as above were used to screen a cosmid library from YAC 
774G4. A group of 5 cosmids was identified (Fig. 1). Experiments with another 
nCL1 pnmer pair (P94ex1ter; Table 1) established that these cosmids cover all 
nCL1 exons except number 1 . and that a second group of 4 cosmids contain this 

SUBSTFTUTE SHEET (RULE 26) 
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exon (Fig. 1). A minimal set of three overlapping cosmids (2G8-2B1 1-IFl l) 
covers the entire gene (Figure 1). DNA from these cosmids was used to 
construct an EcoRl restriction map of this region (Figure IB) 
EXAMPLE 2 

Determination of the nCL1 gene sequence 

Most of the sequences were obtained through shotgun sequencing of partial 
digests of cosmid 1F11 subcloned in Ml 3 and bluescript vectors, and by walking 
with internal primers. The sequence assembly was made using the XBAP 
software of the Staden package (Staden) and was in agreement with the 
restriction map of the cosmids. Sequences of exon 1 and adjacent regions were 
Obtained by sequencing cosmid DNA or PGR products from human genomic 
DNA. The first intron is still not fully sequenced, but there is evidence that it may 
be between 10 to 16 kb in length (based on hybridisation of restriction fragments; 
data not shown). The entire gene, including its 5' and 3' regions, is more than 40 
kb long, and shown in Figure 8. 
a) the cDNA sequence 

The used technology allows the implementation of the published human 
CDNA sequence of nCLI (Sonmachi 1989). It contains the missing 129 bases 
corresponding to the N-terminal 43 amino acids (Figure 2). It also differs from it 
at 12 positions. Three of which occur at third base positions of codons and 
preserve the encoded amino acid sequence. The other 9 differences lead to 
changes in amino-acid composition (Figure 2). As these different exons were 
sequenced repeatedly on at least 10 distinct genomes, we are confident that the 
sequence of Fig. 2 represents an authentic sequence and does not contain 
minor polymorphic variants. Furthermore, these modifications increase the local 
similarity with the rat nCLI amino acid sequence (Sorimachi). although the 
overall similarity is still 94 %. 

The ATG numbered 1 in Figure 2 is the translation initiation site based on 
homology with the rat nCL1 . and is within a sequence with only 5 nucleotides out 
of 8 in common with the Kosak consensus sequence (Kosak M. 1984). Putative 
CCAAT and TATA boxes were observed 590. 324, (CCAAT) and 544 or 33 bp 
(TATA) upstream of the initiating ATG codon. respectively (Bucher, 1990) A GC- 
box binding the Spl protein (Dynan et al., 1983) was identified at position -477 
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Consensus sequences corresponding To potential muscle-specific regulatory 
elements were identified (Fig. 2). These include a myocyte-specific enhancer- 
b,nd,ng factor 2 (MEF2) binding site (Cserjesi P. 1991). a CArG box (Minty A 
1986) and 6 E-boxes (binding sites for basic Helix-Loop-Hel.x proteins frequently 
5 found ,n members of MyoD family; Blackwell et Weintraub. 1 990). The functional 
significance of these putative transcription factor binding sites in the regulation 
of nCLI gene expression remains to be established. 

Two potential AAUAAA polyadenylation signals, were identified 520 and 777 
bp downstream of the TGA stop codon. The sequencing of a partial nCLI cDNA 
containing a polyA tail, demonstrated that the first AAUAAA is the 
polyadenylation signal. The latter is embedded in a region well conserved with 
the rat nCLI sequence and is followed after 4 bp by a G/T cluster, present in 
most genes 3' of the polyadenylation site (Bimstiel et al 1985) The 3' 
untranslated region of the nCL1 mRNA is 565 bp long. The predicted length of 
the cDNA should therefore be approximately 3550 or 3000 bp. 
b) Comparis on with cai pam 

The sequence of the human nCL1 gene was compared to those of other 
calpa,ns thereof (Figure 3). The most telling comparisons are with the 
homologous rat (Accession no J0S121), bovine (Accession no U07858) and 
porcine (Accession no U05678, sequences^ The accession numbers refers to 
those o, international genebanKs, such as GeneBank (N.I.H., or EMBL Database 
(EMBL, Heidelberg). High local similanties between the human and rat DNA 
sequences are even observed ,n the 5' (75%) or In different pans of the 3' 
untranslated regions (over 60%) (data not shown). The high e>c.en. of sequence 
homology manifested by the human and rat nCL1 gene in the.r untranslated 
regions Is suggestive of evolutionary pressures on common putative regulatory 
sequences. ^ ' 

c) Genomic organi sation of th^ nCLI g pnp 

A comparison of the published nCL1 human cDNA (Sonmachi et al., 1989, 
with the corresponding genomic sequence led to the identification of 24 axons 
ranging in length from 12 bp (exon 13) ,o 309 bp (exon 1), with a mean size of 
100 bp (Figure 1). The size of introns ranges from 86 bp to about 10-16 kb for 
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The mtron-exon boundaries as shown-in Table 2 exhibit close adherence to 
5 and 3' splice site consensus sequences (Shapiro and Senapathy 1987) 
Iab!e2. Sequences at the intron-exon junctions. A score expressing adherence 
to the consensus was calculated for each site according to Shapiro and 
Senapathy (1987). Sequences of exons and introns are in upper and lower 
cases, respectively. Size of exons are given in parenthesis 



splice donor site 

-CTCCGgTgagi.. 
-GCTAGgtagga. 
•TCCAGgtgagg.. 
• GCTAAgiaagc. 
TTGATgtaagi... 
CCCGGgigigi.. 
ATGAGginncc... 
GATAGgiaggi.. 
TTCTGgtgagi. 
CCCAGgTggga... 
ACGAGgigtgt... 
AAGAGgiaiag . 
TCTGAgigagi... 
CAGTGgigagt... 
CCAAGgtaggi... 
CACAGgtgtci.. 
GAGATgtgagi.. 
CAAACgigagt... 
TGGATgiaicc.. 
GGCAGgiggga 
CGCAGgtpcig.. 



score Intron 
(%) 



score splice acceptor site 
(%) 



Exon 



88.5 
83.5 
92 
82 
.87 
77.5 
94 
89 
88 
80 
85.5 
70 
76.5 
89 
89 
80 
84 
83 
56 
80 
66 



<-lntron l-> 
<-lniron 2-> 
<-Intron 3-> 
<-Intron 4-> 
<-ln!ron 5-> 
<-Intron 6-> 
<-lniron 7-> 
<-lniron 8-> 
<-lniron 9-> 
<-lniron 10-> 
<-lntron 1 1-> 
<-Imron 12-> 
<-lniron !.>-> 
<-Imron I4-> 
<-Iniron 15-> 
<-Iniron I6-> 
<-lniron 17-> 
<-lniron 18-> 
<-intron iy.> 
<-lniron 20-> 
<-lntron 21-> 



Exon 1 (309 bp) -> 
99.0 ...tmtgtttcacagGAAAT... Exon 2 (70 bp) -> 
90.0 ...gtgtagcagcagGGGAC Exon 3 (119bp)-> 
..acgaiagtgcagTTCTG... Exon 4 (134 bp) -> 
...atcactactaagGCTCC... Exon 5 (169 bp) -> 
...ccaicgggcacagGATGG ... Exon 6 (144bp)-> 
...liactgctctacagACAAT... Exon 7 (84 bp) .> 
...icigigtgctiaagGTCCC Exon 8 (86 bp) -> 
...caitiicccaccagATGGA... Exon 9 (78 bp) .> 
.itccaacactcagGATGT ... Exon 10 (161 bp) -> 
-"ctgggggtgcagATACT... Exon 11 (170bp)-> 
-.igttiatacaagGTTCC... Exon 12 (I2bp)-> 
-iccccatcictcagATGCA... Exon 13 (209 bp) -> 
...tgiaucacacagGGAAG... Exon 14 (37bp)-> 
..amctiaigcagAAAAA... Exon 15 (18bp)-> 
- .ccicctciciccagCCCAT. . . Exon 1 6 ( 11 4 bp) -> 
-iigtgcciccacagCCACA... Exon 17 (78 bp) -> 
xccncacacagGACAT... Exon 18 (58bp)-> 
xiccaiccccccagACAAG.:. Exon 19 (65 bp) -> 
.ccicccicciccagACAGA... Exon 20 (69 bp) -> 
.miciaugccagAAATA... Exon 21 (79 bp) -> 
ggtcccaccacagGATTC... Exon 22 (1 17 bp) -> 



81.5 
81,5 
79,5 
91 

78.5 
91.5 
92 
68.5 
86 
87 
97 
93.5 
87 
88 
92.5 
90 
88 
94 
91 
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...GTrCAgtaagt.. 79 <-lntron 22.> 93.5 ...gcattaticacagGAGCT... Exon 23 (59 bp, .> 
.■■TGGAGptaaap... 8 1 ^Imron jj^^ 79 ...gg.act»cmc..TnnCT.. E.von 24 ^7 K p. 

When the genomic sequence was submitted to GRAIL analysis (Uberbacher 
et a!.. 1991), 11 exons were correctly recognised. 4 were not identified. 6 were 
inadequately defined and 2 were too small to be recognised (data not shown). 
5 As already noted, the nCL1 gene has three unique sequence blocks. NS 

(amino acid residues 1 to 61). IS1 (residues 267 to 329) and IS2 (residues 578 
to 653). It is interesting to note that each of these sequences, as well as the 
nuclear translocation signal inside iS2. are essentially flanked by introns (Fig. 4). 
The exon-intron organisation of the human nCL1 is similar to that reported for 
the chicken CANP (the only other large subunit calpain gene whose genomic 
structure is known; (Emori et al., 1986). 

Four microsatellite sequences were identified. Two of them are in the distal 
part of the first intron. an (AT)14 and an previously identified mixed-pattern 
microsatellite. S774G4B8. which was demonstrated to be non polymorphic 
(Fougerousse et al.. 1994). A (TA)7(CA)4(GA)13 was identified in the second 
intron and genotyping of 64 CEPH unrelated individuals revealed two alleles 
(with frequencies of 0.10 and 0.90). The fourth microsatellite is a mixed 
(CA)n(TA)m repeat present in the 9th intron. The latter and the (AT)14 repeat 
have not been investigated for polymorphism. Fourteen repetitive sequences of 
the Alu family and one Mer2 repeat were identified in the nCLI gene (Fig. 1C). 
which has, thus, on the average one Alu element per 2.5 kb. 

Southern blot experiments (Ohno et a!.. 1989) and STS screening (data not 
Shown) suggest that there is but one copy per genome of this member of the 
calpain family. 

EXAMPLF a 

Expression of the nCL1 gene 

The pattern of tissue-specificity was investigated by northern blot 
hybridisation with a genomic subclone probe from cosmid 1F11 spanning exons 
20 and 21. There is no evidence for the existence of an alternatively spliced form 
of nCLI. although this cannot be excluded. A transcript of about 3.4-3.6 kb 



was 
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detected in skeletal muscle mRNA (Figure 5). This size therefore favours that the 
position -544 is the functional TATA box. 

Transcription studies suggested that it is an active gene rather than a 
pseudogene and its muscle-specific pattern of expression is consistent w.th the 
phenotype of this disorder (Sorimachi et al., 1 989 and Figure 5) 

EXAMPLF4 

Mutation screening 

nCL1 fulfils both positional and functional criteria to be a candidate gene for 
LGMD2A. To evaluate its role in the etiology of this disorder nCLI was 
systematically screened in 38 LGMD2 families for the presence of nucleotide 
Changes us.ng a combination of heteroduplex (Keen et al., 1991) and direct 
sequence analyses. 

PGR primers were designed to specifically amplify the exons and splice 
junctions and also the regions containing the putative CAT. TATA boxes and the 
polyadenylation signal of the gene as shown in Table 3. 
labies. PGR primers used for the analysis of the nCL1 gene in LGMD patients 



amplified region 
promoter 
exon 1 
exon 2 
exon 3 
exon 4 
exon 5 
exon 6 
exon 7 
exon 8 
exon 9 
exon 10 
exon 1 1 
exon 12 
exon 13 



TTCAGTACCTCCCGTTCACC 
GATGCTTGAGCCAGGAAAAC 
CTTTCCTTGAAGGTAGCTGTAT 
GAGGTGCTGAGTGAGAGGAC 
ACTCCGTCTCAAAAAAATACCT 
ATTGTCCCTTTACCTCCTGG 
TGGAAGTAGGAGAGTGGGCA 
GGGTAGATGGGTGGGAAGTT 
GAGGAATGTGGAGGAAGGAC 
TTCCTGTGAGTGAGGTCTCG 
GGAACTCTGTGACCCCAAAT 
TCCTCAAACAAAACATTCGC 
GTTCCCTACATTCTCCATCG 
GTTATTTCAACCCAGACCCTT 
AATGGGTTCTCTGGTTACTGC 
AGCACGAAAAGCAAAGATAAA 
GTAAGAGATTTGCCCCCCAG 
TCTGCGGATCATTGGTTTTG 
CCTTCCCTTCTTCCTGCTTC 
CTCTCTTCCCCACCCTTACC 
CCTCCTCACCTGCTCCCATA 
TTTTTCGGCTTAGACCCTCC 
TGTGGGGAATAGAAATAAATGG 
CCAGGAGCTCTGTGGGTCA 
GGCTCCTCATCCTCATTCACA 
GTGGAGGAGGGTGAGTGTGC 
TGTGGCAGGACAGGACGTTC 



Size fbp) 
296 
438 
239 
354 
292 
325 
315 
333 
321 
173 
251 
355 
312 
337 



60 

57 

58 

59 

56 

57 

56 

58 

56 

56 

57 

61 

60 
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cxon 14 
cxon 15 
exon 16 
exon 17 
exon 18 
exon 19 
cxons 20-2 1 

exon 22 
exons 22-23 
exon 24 
polyadenvlaiion signal 



14 

TTCAACCTCTGGAGTGGGCC 
CACCAGAGCAAACCGTCCAC 
ACAGCCCAGACTCCCATTCC 
TTCTCTTCTCCCTTCACCCT 
ACACACTTCATGCTCTCTACCC 
CCGCCTATTCCTTTCCTCTT 
GACAAACTCCTGGGAAGCCT 
ACCTCTGACCCCTGTGAACC 
TGTGGATTTGTGTGCTACGC 
CATAAATAGCACCGACAGGGA 
GGGATGGAGAAGAGTGAGGA 
TCCTCACTCTTCTCCATCCC 
ACCCTGTATGTTGCCTTGG 
GGGGATTTTGCTGTGTGCTG 
ATTCCTGCTCCCACCGTCTC 
CACAGAGTGTCCGAGAGGCA 
GGAGATTATCAGGTGAGATGCC 
CAGAGTGTCCGAGAGGCAGGG 
CGTTGACCCCTCCACCTTGA 
GGGAAAACATGCACCTTCTT 
TAGGGGGTAAAATGGAGGAG 
ACTAACTCAGTGGAATAGGG 
GGAGCTAGGATAGOTrAaT 



230 


6] 


225 


57 


331 


56 


270 


61 


258 


59 


159 


57 


333 


61 


282 


57 


608 


61 


375 


58 


413 


56 



10 



PGR products made on DNA from blood of specific LGMD2A patients were 
then subjected either to heteroduplex analysis or to direct sequencing 
depending on whether the mutation, based on haplotype analysis, was expected 
to be homozygous or heterozygous, respectively. It was occasionally necessary 
to Clone the PGR products to precisely Identify the mutations (i e for 
microdeletions or insertions and for some heterozygotes). Disease-associated 
mutations are summarised in Table 4 hereunder and their position along the 
protein is shown in Fig. 4. 

Iabie4, nCLI mutations in LGMD2A families. 

Codons and amino acid positions are numbered on the basis of the cDNA 
sequence starting from ATG. 



Exon Families 



2 
4 
4 



B519* 
M42 

M1394: M2888 
M35: M37 



Nucleotide Nucleotide change Ammo acid Ammo acid Restrictions. 

sition change 

328 CGA->TGA no 

545 CTG -> CAG 182 

550 CAA.>CA 184 

701 GGG -> GAG 234 



Arg->stop 
Lcu->Gln 
framcshift 
GK.>Glu 
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IS 



10 



15 



20 



6 


M32 


945 


CGG -> CG 


315 


frameshift 


-Sma! 


8 


M2407* 


1061 


GIG .> GGG 


354 


VaI-> GI> 




8 


Ml 394 


1079 


TGG -> TAG 


360 


Trp->siop 


-Bsinl. 


1 1 


M2888 


1468 


CGG -> TGG 


490 


Arg->Trp 




13 


R12* 


1715 


CQG -> CAG 


572 


Arg->Gln 


•Mspl 


19 


R27 


2069-2070 


deletion AC 


690 


frameshift 




2) 


R14:R17 


2230 


AGC .> GGC 


744 


Ser.>Gh 


-Alul 


22 


A*:B501*: 


M32 2306 


CGG .> CAG 


769 


Arg->Gln 




22 


B505 


2313-2316 


deletion AGAC 


771-772 


frameshift 




22 


R]4:B505 


2362-2363 


AG -> TCATCT 


788 


frameshift 





The first letter of the family code refers to the origin of the population B= Brazil. 
M= metropolitan France. R = Isle of La Reunion. A= Amish. 

Each mutation was confirmed by heteroduplex analysis, by sequencing of 
both strands in several members of the family or by enzymatic digestion when 
the mutation resulted in the modification of a restriction site. Segregation 
analyses of the mutations, performed on DNAs from all available members of the 
families, confirmed that these sequence variations are on the parental 
chromosome carrying the LGMD2A mutation. To exclude the possibility that the 
missense substitutions might be polymorphisms, their presence was 
systematically tested in a control population: none of these mutations was seen 
among 120 control chromosomes from the CEPH reference families. 

EXAMPLE 5 • 

Analysis of families genes, chromosome-15 ascertained families 
The initial screening for causative mutations was performed on families, 
each containing a LGMD gene located on chromosome 15. These included 
families from the Island of La Reunion (Beckmann et al.. 1991). from the Old 
Order Amish from northern Indiana (Young et a!.. 1992.) and 2 Brazilian families 
(Passes Bueno et al., 1993). 
a) Reunion Island familipg 

Genealogical studies and geographic isolation of the families from the Isle 
of La Reunion were suggestive of a single founder effect. Genetic analyses are. 
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however, inconsistent with this hypothesis as the families present haplotype 
heterogeneity. At least, six different carrier chromosomes are encountered, (with 
affected individuals in several families being compound heterozygotes). Distinct 
mutations corresponding to four of these six haplotypes have been identified 
5 thus far. 

In family R14, exons 13, 21 and 22 showed evidence for sequence variation 
upon heteroduplex analysis (Fig. 6). Sequencing of the associated PGR products 
revealed (i) a polymorphism in exon 13, (ii) a missense mutation (A->G) in exon 
21 transforming the Ser744 residue to a glycine in the loop of the second EF- 
) hand in domain IV of the protein (Figure 4). and (iii) a frameshift mutation in exon 
22. The exon 21 mutation and the polymorphism in exon 13 form an haplotype 
which is also encountered in family R17. Subcloning of the PGR products was 
necessary to identify the exon 22 mutation. Sequencing of several clones 
revealed a replacement of AG by TGATGT (data not shown). This frameshift 
mutation causes premature termination at nucleotide 2400 where an in frame 
stop codon occurs (Figure. 4). 

The affected individuals in family R12 are homozygous for all markers of the 
LGMD2A interval (Ailamand. submitted). Sequencing of the PGR products of 
exon 13 revealed a G to A transition at base 1715 of the cDNA resulting in a 
substitution of glutamine for Arg572 (Figure. 7) within domain III, a residue which 
is highly conserved throughout all known calpains. This mutation, detectable by 
loss of Msp\ restriction site, is present only in this family and in no other 
examined LGMD2A families or unrelated controls. 

In family R27. heteroduplex analysis followed by sequencing of the PGR 
products of an affected child revealed a two base pair deletion in exon 19 
(Figure. 6 and table 4). One AG out of three is missing at this position of the 
sequence, producing a stop codon at position 2069 of the cDNA sequence 
(Figure 4). 

b) Amish families 

As expected, due to multiple consanguineous links, the examined LGMD2A 
Northern Indiana Amish patients were homozygous for the haplotype on the 
chromosome beahng the mutant allele (Ailamand. submitted). A (G->A) 
missense mutation was identified at nucleotide 2306 within exon 22 (Fig. 7). The 
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resulting codon change is CGG to CAG, transforming Arg769 to glutamine. This 
residue, which is conserved throughout all members of the calpain family in all 
species, is located in domain IV of the protein within the 3rd EF-hand at the 
helix-loop junction (ref). This mutation was encountered in a homozygous state 
in all patients from 12 chromosome 15-linked Amish families, in agreement with 
the haplotype analysis. We also screened six Southern Indiana Amish LGMD 
families, for which the chromosome 15 locus was excluded by linkage analyses 
(Allamand ESHG, submitted, ASHG 94). As expected, this nucleotide change 
was not present in any of the patients from these families, thus confirming the 
genetic heterogeneity of this disease in this genetically related isolate. 

c) Brazilian familiP.«t 

As a result of consanguineous marriages, two Brazilian families (B501, 
B519) are homozygous for extended LGMD2A carrier haplotypes (data not 
shown). Sequencing PGR products from affected individuals of these families 
demonstrated that family B501 has the same exon 22 mutation found in northern 
Indiana Amish patients (Figure 7), but embedded in a completely different 
haplotype. In family B519, the patients carry a C to T transition in exon 2, 
replacing Arg328 ^ith a TGA stop codon (Figure 7), thus leading, presumably, to 
a very truncated protein (Figure 4). 

d) Analysis of other LGMD families 

Having validated the role of the candidate gene in the chromosome 15 
ascertained families, we next examined by heteroduplex analysis LGMD families 
for which linkage data were not informative. These included one Brazilian (B505) 
and 13 metropolitan French pedigrees. 

Heteroduplex bands were revealed for exons 1 , 3, 4, 5, 6, 8, 11 , 22 of one 
or more patients (Figure 6). Of all sequence variants, 10 were identified as 
possible pathogenic mutations (5 missense, 1 nonsense and 4 frameshift 
mutations) and 3 as polymorphisms with no change of amino acid of the protein. 
All causative mutations identified are listed in Table 4 here-above. Identical 
mutations were uncovered in apparently unrelated families. The mutations 
shared by families M35 and M37, and M2888 and Ml 394, respectively, are likely 
to be the consequence of independent events since they are embedded in 
different marker haplotypes. In contrast, it is likely that the point mutation in exon 
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22 of the Amish and in the M32 kindreds corresponds to the same mutational 
event as both chromosomes share a common four marker haplotype C774G4A1- 
774G4A10-774G454D-774G4A2) around nCLI (data not shown), possibly 
reflecting a common ancestor. The same holds true for the AG to TCATCT 
5 substitution mutation encountered in exon 22 in families B505 and R14. The 
exon 8 (T->G) transversion is present in the two carrier chromosomes of M2407. 
the only metropolitan family homozygous by haplotype, possibly reflecting an 
undocumented consanguinity. For some families, no disease-causing mutation 
has been detected thus far (M40 for example). 

In addition to the polymorphism present in exon 13 in families R14 and R17 
(position 668) and in the intragenic microsatellites. four additional neutral 
vahations were detected: a (T->C) transition at position 96. abolishing a Dc^el 
restriction site in exon 1 in M31; a (C->T) transition in exon 3 (position 495) in 
M40 and in M37 forming a haplotype with the exon 5 mutation (in the former 
15 family, this polymorphism does not cosegregate with the disease); a (T->C) 
transition in the paternally derived promoter in M42 at position -428, which was 
also evidenced in healthy controls; and a variable poly(G) in intron 22 close to 
the splice site in families R20. R11. R19, M35 and M37. The latter is also 
present in the members of the CEPH families, but is not useful as a genetic 
20 marker as the visualisation and interpretation of mononucleotide repeat alleles is 
difficult. 

In total, sixteen independent mutational events representing fourteen 
different mutations were identified. All mutations cosegregate with the disease in 
LGMD2A families. The characterised morbid calpain alleles contain nucleotide 
changes which were not found in alleles from normal individual. The discovery of 
two nonsense and five frameshift mutations in nCLI supports the hypothesis that 
a deficiency of this product causes LGMD2A. All seven mutations result in a 
premature in-frame stop codon. leading to the production of truncated and 
presumably inactive proteins (Figure 4). Evidences for the morbidity of the 
missense mutations come from (1) the relative high incidence of such mutations 
among LGMD2A patients ; although it is difficult in the absence of functional 
assays to differentiate between a polymorphism and a morbid mutation, the 
occurrence of different "missense" mutations in this gene cannot all be 
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accounted for as rare private polymorphisms; (2) the failure to observe these 
mutations in control chromosomes; and (3) the occurrence of mutations in 
evolutionarily conserved residues and/or in regions of documented functional 
importance. Four of seven missense mutations change an amino acid which is 
5 conserved in all known members of the calpain family in all species (Figure 3). 
Two of the remaining mutations affect less conserved amino acid residues, but 
are located in important functional domains. The substitution VSSAG in exon 8 is 
4 residues before the asparagine at the active site and S744G in exon 21 is 
within the loop of the second EF-hand and may impair the calcium-dependent 
regulation of calpain activity or the interaction with a small subunit (Figure 4). 
Several missense mutations change a hydrophobic residue to a polar one. or 
vice versa (Table 4) possibly disrupting higher order structures. 
METHODS 

Description of the patients 

The LGMD2A families analysed were from 4 different geographic ongins. 
They included 3 Brazilian families, 13 interrelated nuclear families from the Isle 
of la Reunion. 10 French metropolitan families and 12 US Amish families. The 
majority of these families were previously ascertained to belong to the 
chromosome 15 group by linkage analysis (Beckmann. 1991; Young, Passos- 
Bueno et al., 1993). However, some families from metropolitan France as well as 
one Brazilian family, B505, had non significant lodscores for chromosome 15. 
Genomic DNA was obtained from peripheral blood lymphocytes. 

Sequencing of cosmid c774G4 -1 F1 1 and EcoRI resthction map of cosmiri.^ 
Cosmid 1F11 (Figure 1C) was subcloned following DNA preparation through 
Qiagen procedure (Qiagen Inc.. USA) and partial digestion with either Sau3A, 
Rsa\ or Alul Size-selected restriction fragments were recovered fom low-melting 
agarose and eventually ligated with Ml 3 or Bluescript (Stratagene, USA) 
vectors. After electroporation in E.co/i, recombinant colonies were picked in 100 
Ml of LB/ampicillin media. PGR reactions were performed on 1 pi of the culture in 
10 mM Tris-HCI. pH 9.0. 50 mM KCI, 1.5 mM MgCI2. 0.1% Triton X-100. 0.01 
gelatine, 200mM of each dNTP, 1 U of Taq Polymerase (Amersham) with 100 ng 
of each vectors primers. Amplification was initiated by 5 min denaturation at 
95»C, followed by 30 cycles of 40 sec denaturation at 92°C and 30 sec annealing 
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at 50"C. PGR products were purified through Microcon devices (Am'icon. USA) 
and sequenced using the dideoxy chain termination method on an ABI 
sequencer (Applied Biosystems. Foster City. USA). The sequences were 
analysed and alignments performed using the XBAP software of the Staden 
package, version 93.9 (Staden. 1982). Gaps between sequence contigs were 
filled by walking with internal primers. EcoR\ restriction map of cosmids was 
performed essentially as described in Sambrook et al. (1989). 
Northern Bint analysis 

The probes were labelled by random priming with dCTP-(a32p). 
Hybhdisation was performed to human multiple tissue northern blots as 
recommended by the manufacturer (Clontech. USA). 

Analvsis of PHR p roducts from LGMD2A famiii>^c 

One hundred ng of human DNA were used per PGR under the buffer and 
cycle conditions described in Fougerousse (1994) (annealing temperature shown 
.n Table 3). Heteroduplex analysis (Keene et al.. 1991) was performed by 
electrophoresis often pi of PGR products on a 1.5 mm-thick Hydrolink MDE gels 
(B.oprobe) at 500-600 volt for 12-15 h depending of the fragment length 
Migration profile was visualised under UV after ethidium bromide staining. 

For sequence analysis, the PGR products were subjected to dye-dideoxy 
sequencing, after purification through microcon devices (Amicon, USA). When 
necessary, depending on the nature of the mutations (e.g., frameshift mutation or 
for some heterozygotes). the PGR products were cloned using the TA cloning kit 
from Invitrogen (UK). One pi of product was ligated to 25 ng of vector at 12X 
overnight. After electroporation into XLI-blue bacteria, several independent 
clones were analysed by PGR and sequenced as described above. 

The invention results from the finding that the nGLI gene when it is mutated 
.s involved in the etiology of LGMD2A. It is exactly the contrary to what is stated 
.n the litterature, e.g. that the disease is accompanied by the presence of a 
deregulated calpain. Identification of nGLI as the defective gene in LGMD2A 
represents the first example of muscular dystrophy caused by mutation affecting 
a gene which is not a structural component of muscle tissue, in contrast with 
previously identified muscular dystrophies such as Duchenne and Becker 
(Bonilla et al.. 1988), severe childhood autosomal recessive (Matsumara et al 
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1992), Fukuyama (Matsumara et al., 1993) and merosin-deficient" congenital 
muscular dystrophies (Tome et al., 1994). 

The understanding of the LGMD2A phenotype needs to take into account 
the fact that there is no active nCLI protein in several patients, a loss compatible 
5 with the recessive manifestation of this disease. Simple models in which this 
protease would be involved in the degradation or destabilisation of structural 
components of the cytoskeleton. extracellular matrix or dystrophin complex can 
therefore be ruled out. Furthermore, there are no signs of such alterations by 
immunocytogenetic studies on LGMD2 muscle biopsies (Matsumara et al.. 1993; 
) Tome et al., 1994). Likewise, since LGMD2A myofibers are apparently not 
different from other dystrophic ones, it seems unlikely that this calpain plays a 
role in myoblast fusion, as proposed for ubiquitous calpains (Wang et al., 1989). 

All the data disclosed in these examples confirm that the nCLI gene is a 
major gene involved in the disease when mutated. 

The fact that morbidity results from the loss of an enzymatic activity raises 
hopes for novel pharmaco-therapeutic prospects. The availability of transgenic 
models will be an invaluable tool for these investigations. 

The invention is also relative to the use of a nucleic acid or a sequence of 
nucleic acid of the invention, or to the use of a protein coded by the nucleic acid 
for the manufacturing of a drug in the prevention or treatment of LGMD2. 

The finding that a defective calpain underlies the pathogenesis of LGMD2A 
may prove useful for the identification of the other loci involved in the LGMDs. 
Other forms of LGMD may indeed be caused by mutations in genes whose 
products are the CANP substrates or in genes involved in the regulation of nCLI 
expression. Techniques such as the two-hybrid selection system (Fields et al., 
1989) could lend themselves to the isolation of the natural protein substrate(s) of 
this calpain, and thus potentially help to identify other LGMD loci. 

The invention also relates to the use of all or a part of the peptidic sequence 
of the enzyme, or of the enzyme, product of nCL1 gene, for the screening of the 
ligands of this enzyme, which might be also involved in the etiology and the 
morbidity of LGMD2 

The ligands which might be involved are for example substrate(s), activators 
or inhibitors of the enzyme. 
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The nucleic acids of the invention might also be used in a screening method 
for the determination of the components which may act on the regulation of the 
gene expression. 

A process of screening using either the enzyme or a host recombinant cell, 
containing the nCLI gene and expressing the enzyme, is also a part of the 
invention. 

The pharmacological methods, and the use of nucleic acid and peptidic 
sequences of the invention are very potent applications. 

The methods used for such screenings of ligands or regulatory elements are 
those described for example for the screening of ligands using cloned receptors. 

The identification of mutations in the nCL1 gene provides the means for 
direct prenatal or presymptomatic diagnosis and carrier detection in families in 
which both mutations have been identified. Gene-based accurate classification 
of LGMD2A families should prove useful for the differential diagnosis of this 
disorder. 

The invention relates to a method of detection of a predisposition to LGMD2 
in a family or a human being, such method comprising the steps of : 

- selecting one or more exons or flanking sequences which are sensitive in 
said family; 

- selecting the primers specific for the or these exons or their flanking 
sequences, a specific example being the PGR primers of Table 3. or an hybrid 
thereof. 

- amplifying the nucleic acid sequence, the substrate for this amplification 
being the DNA of the human being to be checked for the predisposition, and 

- compahng the amplified sequence to the corresponding sequence derived 
from Figure 2 or Figure 8. 

Table 2 indicates the sequences of the introns-exons junctions, and primers 
comprising in their structure these junctions are also included in the invention. 

All other primers suitable for such RNA or DNA amplification may be used in 
the method of the invention. 

In the same way. any suitable amplification method : PGR (for Polymerase 
Chain Reaction ®) NASBA ® (for Nucleic acid Sequence Based Amplification). 

or others might be used. 
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The methods usually used in the detettion of one site mutations, like ASO 
(Allele specific PGR), LCR, or ARMS (Amplification Refactory Mutation System) 
may be implemented with the specific primers of the invention. 

The phmers. such as described in Tables 1 and 3, or including junctions of 
Table 2, or more generally including the flanking sequences of one of the 24 
exons are also a part of the invention. 

The kit for the detection of a predisposition to LGMD2 by nucleic acid 
amplification is also in the scope of the invention, such a kit comprises a least 
PGR primers selected from the group of : 

a) in those described in table 1 

b) in those described in table 3 

c) those including the introns-exons junctions of Table 2. 

d) derived from pnmers defined in a),b) or c). 

The nucleic acid sequence of claim 1 to 3 might be inserted in a viral or a 
retroviral vector, said vector being able to transfect a packaging cell line. 

The packaging transfected cell line, might be used as a drug for gene 
therapy of LGMD2. 

The treatment of LGMD2 disease by gene therapy is implemented by a 
pharmaceutical composition containing a component selected from the group of : 

a) a nucleic acid sequence according to claims 1 to 4, 

b) a ceil line according to claim 24, 

c) an aminoacid sequence according to claims 5 to 9. 
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CLAIMS 

1 . A nucleic acid sequence comprising : 

1) the sequence represented in Figure 8; or 

2) the sequence represented in Figure 2; or 

3) a part of the sequence of Figure 2 with the proviso that it is able to 
code for a protein having a calcium dependant protease activity involved in a 
LGMD2 disease ; or 

4) a sequence derived from a sequence defined in 1). 2) or 3) by 
substitution, deletion or addition of one or more nucleotides with the proviso that 
said sequence still codes for said protease. 

2. A nucleic acid sequence that is complementary to a nucleic acid 
sequence according to claim 1 . 

3. A nucleic acid sequence comprising in its structure a nucleotidic 
sequence according to claim 1 or 2. under the control of regulatory elements, 
and involved in the expression of calpain activity in a LGMD2 disease. 

4. A nucleic acid sequence encoding the aminoacid sequence represented 
in Figure 2. 

5. An amino acid sequence which is coded by a nucleic acid sequence 
according to claims 1 to 4. characterized in that it is a calcium dependent 
protease enzyme belonging to the calpam family, involved in the etiology of 
LGMD2. 

6. An aminoacid sequence according to claim 5 or 6. characterized in that 
either it contains the sequence such as represented in Figure 2. or the amino 
acid sequence of Figure 2 modified by deletion, insertion and/or replacement of 
one or more amino acids with the proviso that such aminoacid sequence has the 
calpam activity involved in LGMD2 disease. 

7. An amino acid sequence according to claim 5 or 6. characterized in that 
LGMD2 is LGMD2A. 

8. A host cell unable to express a calpam enzyme activity, characterized in 
that it is transformed or transfected with a nucleic acid sequence comprising all 
or pari of the nucleic acid sequence according to any one of claims 1 to 4. 
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9. Use Of a nucleic acid according to one of claims 1 to 4 or a host cell 
according to claim 8 in the manufacturing of a drug for the prevention or the 
treatment of an LGMD2 disease. 

10. Use of an amino acid sequence according to claims 5 to 6 in the 
manufacturing of a drug for the prevention or the treatment of an LGMD2 
disease. 

11. Use according to claims 10 or 11. characterized in that LGMD2 is 
LGMD2A. 

12. Use of an amino acid sequence according to claims 5 to 7 for the 
screening of the ligands of said amino acid sequence, said iigand being selected 
in a group consisting of substrate(s). co-factors or regulatory components. 

13. Use of a nucleic acid sequence according to one of claims 1 to 4 in a 
screening method for the detennination of the components which may act on the 
regulation of gene expression of calpain. 

14. Use of an host cell according to claim 8 in a screening method for the 
detennination of components active on the expression of the calpain. 

15. A method for detecting of a predisposition to a LGMD2 disease in a 
family or a human being, such method comprising the steps of : 

- selecting one or more exons or their flanking sequences of the gene. 

- selecting primers specific for these exons. or their flanking sequences, or 
an hybrid thereof, 

- amplifying the nucleic acid sequences with these primers, the substrate for 
this amplification being the DNA of a human being; and 

- comparing the amplified sequence to the corresponding sequence derived 
from Figure 2 or Figure 8. 

16. The method according to claim 15. characterized in that the primers are 
those selected from the group of : 

a) those described in Table 1 ; 

b) those described in Table 3. and 

c) those including the introns-exons junctions of Table 2; 

d) those derived from the primers in a), b). or c). 

17. The method according to claim 15 or 16. characterized in that LGMD2 is 
LGMD2A. 
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18. A kit for the detection of a predisposition to LGMD2 by nucleic and 
amplification characterized in that it comprises primers selected from the group 

of : 

a) those described in Table 1 ; 
5 b) those described in Table 3; and 

c) those including the introns-exons junctions of Table 2; 

d) those derived from the primers in a), b) or c). 

19. Use of a host cell according to claim 8 in a manufacturing of a drug for 
gene therapy of an LGMD2 disease. 

) 20. Pharmaceutical composition for the treatment of an LGMD2 disease 
characterized in that in contains a component selected from the group of : 

a) a nucleic acid sequence according to claims 1 to 4, 

b) a host cell according to claim 8, 

c) an aminoacid sequence according to claims 5 to 7. 
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sequence 
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TCCTCCGGGTCTT 
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LISTE DE SEQUENCES 



(1) INFORMATION GENERALE: 

(i) DEPOSANT: 

(A) NOM: AFM 

(B) RUE: 13. place de Rungis 

(C) VILLE: PARIS 

(E) PAYS: FRANCE 

(F) CODE POSTAL: 75013 

(G) TELEPHONE: (1) 45 65 13 GO 

(ii) TITRE DE L' INVENTION: LGMD GENE 
(iii) NOMBRE DE SEQUENCES: 4 

(iv) FORME LISIBLE PAR ORDINATEUR: 

(A) TYPE DE SUPPORT: Floppy disk 

(B) ORDINATEUR: IBM PC compatible 

(C) SYSTEME D' EXPLOITATION: PC-DOS/MS-DOS 

(D) LOGICIEL: Patentin Release #1.0. Version #1.25 (OEB) 

(2) INFORMATION POUR LA SEQ ID NO: 1: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 3018 paires de bases 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lin6aire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 1: 



TGATAGGTGC 


TTGTAAACTG 


TGCTTAACGA AAACATACCG 


TGTGCTGTAG 


GGACTTAACT 


60 


CTTGTTTATA 


TCAGTTAGCC 


TGGTTTCGCT 


AACAGTACAT 


CATTTTGCTT 


AAAGTCAGAG 


120 


CTTACGAGAA 


CCTATCGATG 


ATGTTAAGTG 


AGGATTTTCT 


CTGCTCAGGT 


GCACTTTTTT 


180 


TTTTTTTTAA 


GACGGAGTCT 


CTTTCTGTCA 


CCTGGGCTGG 


AGTGCAGTGG 


CGTGATCTGG 


240 


GTTCACAACA 


ACCTCTGCCT 


CCTGGGTTCA 


AGCAATTCTT 


CTGTCTCAGC 


CTCCCAAGTA 


300 


GCTGGGATTA 


CAGGCACCCG 


CCGCCACACC 


CGGCTTATTT 


TTGTATTTTT 


AGTAGAGACA 


360 


GGGTTTCACT 


ATTGTTGACC 


ATGCTGGTCT 


CGAACTCGTG 


ACCTCATGTG 


ATCCACCCGC 


420 


CTCGGCCTCC 


CAAAGTGCAG 


AGATTAGAGA 


CGTGAGCCAC 


ATGGCCCAGC 


AGGACCACTT 


480 
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TTTAGCAGAT 
CAAGTGTGCA 
CATGCAGACA 
CATTGAATAA 
CACAGAATAT 
ATTGTTTTCC 
AACATGTCAG 
ATTTTCTGTC 
TACCATGCAG 
TCCTGGCTCA 
CTGGCATGCA 
CACTCATTTC 
TAACCTGTCC 
CAGATGACAG 
GTATCTTATT 
TAGCGCATCT 
CCCGGCCCAG 
CATCATCAGC 
CAAGAAATGT 
CTCTCTCTTT 
AGCTTCCTGC 
CGGCAGCTCA 
CTCTGTCTTT 
TCAGCAAAAT 
TCTCTGAAAA 
CATCCTGTAA 
GTTCGTTTTA 



TCAGTCCGAG 
GGTAGAGACA 
TTTCCAATGA 
TGTTCTGATA 
TTTTGTAGAA 
ATTCATTTGA 
CAGTTCTCAG 
AACACCAGCA 
TCTCTCTTGC 
AGCATCTTCA 
TGCTGCTGGT 
TCAGGAGAAC 
GACCTTCTGA 
AATTACTCCA 
TTCTTTAAAA 
GTGGCTCCAA 
AGCAAGGCCA 
CGCAATTTTC 
CTAGAAAAGA 
TATAGCCAGA 
TTGCTGGCTG 
GCTGTGCACA 
AAGTGTGAAG 
CCAGAGGGAG 
AAAAAAAAAA 
AAATAAATAT 
ATATTATTCA 
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TGTTCATTTT GTGGATGGGG 
GGGATTTTCT CAAATGAGGA 
GCGCTGACCC AAGAACATTC 
TCCTAAAATT TTAGGAGTAA 
TTCAGTACCT CCCGTTCACC 
TGGGCAGTAG TTGGGTGGTC 
CTTCTTTCCA GTGTTCAGCT 
CTTCATGTCA ACAGAAATGT 
TCTCATACTC ACAGTGTTTC 
GGCCACTGAA ACACAACCCT 
AGGAGACCCC CAAGTCAACA 
TTATGGCTTC AGAATCACAG 
TGGGCTTTCA ACTTTGAACT 
ACTTCCCCTT TGCAGTTGCT 
AGCTTTTTCT TCCAAAGCCA 
GGACAGCGGC TGAGCCCCGG 
CTGAGGCTGG GGGTGGAAAC 
CTATTATCGG AGTGAAAGAG 
AAGTTCTTTA TGTGGACCCT 
AGTTCCCCAT CCAGTTCGTC 
GGTTTCCCCC CCACGGAGGA 
TGGGCACTGG GGGAAGGATC 
CAGGGAGGAG AGGAACAGGT 
AGCGGAGGAG GTGGGGTGAT 
ATCTTGCTTT TTATAAAAGT 
TCCTTTCTCA GAACAAATTC 
TCTTGGTAAG ATTATTTCAG 



AGAGACAAGA GGTGCAAGGT 540 

CTCTGCTGAG TAGCATTTTC 600 

TAAAAAGATA CGAAATCTAA 660 

AAATCATGTT CTCTAAAATT 720 

CTAAGTAGCT TTTTTGCAAT 780 

TGTATAACTG CCTACTGAAT 840 

TACTCAGATA CTGCCTTTTC 900 

CCCTAGCCAG GTTCTCTCTC 960 

TTCAGATCTA TTTTTAGTTT 1020 

CACTCTCTTT CTCTCTCCCT 1080 

TTGCTTCAGA AATCCTTTAG 1140 

CTCGGTTTTT AAGATGGACA 1200 

GGATGTGGAC ACTTTTCTCT 1260 

TCCTTTCCTT GAAGGTAGCT 1320 

CTTGCCATGC CGACCGTCAT 1380 

TCCCCAGGGC CAGTTCCTCA 1440 

CCAAGTGGCA TCTATTCAGC 1500 

AAGACATTCG AGCAACTTCA 1560 

GAGTTCCCAC CGGATGAGAC 1620 

TGCAAGAGAC TCCGGTGAGT i680 

GTCCTCTCAC TCAGCACCTC 1740 

CTGGCAGCAG CTCTGCTGGG 1800 

CTCAGATATT TCACCAAATC I860 

TCTTATGCTC TGGCTCTTTC 1920 

GGGTGGAACT CAGTTTAATT 1980 

CAGACAGCCC AGATGTACCT 2040 

TTTCTCTGGC TAAAATCATG 2100 
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ATGTTATTCT 
GAGAAGAGTC 
AGACCTAGCA 
TAGAGTGAAA 
AACTCTCCCA 
TGGCTCACCT 
ATTGAGCATC 
AAGACCAAAA 
AAGAGTGACA 
CTAACCTGGT 
ACTATTGATT 
TATAAACCCA 
ACCAGCCTGG 
CCAGGCATGG 
CTTGAACTCG 
GGGTGACAGG 



TCTTTAATTT 
ATAGGCAAGG 
ATCGCTTTGG 
TATATCTAGT 
GCCTCTGGGT 
CTCTGATCAT 
TACTAGTGCC 
TTCCAGCTGT 
TTGTCAGGAG 
CCAGGGAGAC 
AGCCATGGTT 
GCATTTTGGA 
GCAACAGGGT 
TGGCACATGC 
GGGAGTTTGA 
AGTGAGAC 
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ACCAATGGCC ATTCTTTCTG 
AATTTTTTTC ATGCATAAAA 
TCCACCTACC TCACCTCATA 
GGGCACATGA CAGAGCCCGG 
TTCATTTACA GTGATCGCCA 
CCCTCCAGTG TGACTCTTGT 
AGCACTGGGC AAGCAACTGG 
CTTGGAACCT AGGGTCCTGA 
ACGATGTTCT GGGTGCCACA 
AAACCCTCTC TGAGGAAATG 
TTCTTTAACC TAAGGTGGGC 
AGGCCCAGGC TGGAGGATTG 
GAAAACCTAT CTCTTTTGTA 
CTGTGGTCCT AGCTACTCAG 
GGCAGCAGTG AGCCGAGATC 



AAACACAGAA 
TGTTGGGGTT 
AGTGAGGAGT 
ATTAAAACTT 
GGAGGGAAAT 
TCTTAATTCG 
GGGGACAGCA 
AGGGAAGATG 
GGATCATGTG 
ATGACAAGCT 
CAGGCATGGT 
CTTGAGCCCA 
CTAAAAATTC 
AGGCTGAGGT 
ATGCCACTGC 



ACCCTAGAAA 
AAAGAGAGAG 
CAAGGCACAC 
TGTTTTAGGA 
CACATTCCCC 
AGAAATATTT 
GTGAGTAAGA 
GGCATTGAAC 
GCAAGGAGAG 
GAGACCCAAT 
GGCTCATGCC 
AGAGTTAGAG 
AAAAAATTAT 
GGGAAGATCA 
ACTCCAGGCT 



2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3018 
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(2) INFORMATION POUR LA SEQ ID NO: 2: 

Ci) CARACTERISTIQUES DE LA SEQUENCE- 

(A) LONGUEUR: 11451 paires de bases 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE SEQUENCE: SEQ ID 
GATCCACCCG CCTTGGCCTC CCAAAGTGCT GAGATTACAG 
CCGACACTGC CCTAACTCTC AAGTTGCATC CTTACTCGAA 
CAGCATGGGA CAATGTAAAA AGGAGGCATG TTTCTGGCTT 
GTCTTTGCAC GAGTTTCTTA ACCTCTCTGG GCCTCAGTTT 
TGATAGTATT CCCTTCACAG GGCCAAATGG AATACTATCA 
TCAATAAATA ATAGCTACTG CGGCCGGGCG CGGTGGCTCA 
TGGGAGGCCG KGGCGGGTQG ATCACAAGGT CAAGAGATGG 
GTGAAACCGT ATCTCTACTA AAGATACAAA AATTAGCTGG 
AGTCCCAGCT ACTCGAGAGG CTGAGGCAGG AGAATCACTT 
TCAGTGAGCC AAGATTGCAC CAGTGCACTG CAGCCTGGCG 
AAAAAAATAC CTATCTATCT ATCTGTCTAT CTACTGTTAT 
TTTGTTTCAC AGGAAATTTG CGAGAATCCC CGATTTATCA 
GACATCTGTC AAGGAGAGCT AGGTAGGAAA GTGCCTCAGG 
AAGGGGTGAT TACAAGGTGT GATCCCCTTC CAGGAGGTAA 
TCCAGTAACT TTTTGGAAGA TTTTTTATAA CAGTTGCTTT 
TGGCGATTGC TTCATTTCCT CCTACATGCC TCTTTAGCAC 
GTATCTGCAT CCTGTGGCCT CCTCTCCAGT ATCTCAAGGA 
CATGACAAAA GCCCTGCTTT TCACTGTATC GTCTTTCTTG 
GCACCAAGCA TGCCCCTTGG GCATGGAGAT TCTAGATACA 
GGAAAGCACT TGTAACTGGA ACCCTTGGTT TAAATTGGCC 



NO: 2: 
GTGTGAGCCA 
TAGTATGACA 
CTGCTACTTA 
CCTTATCTGA 
GGAACACTAC 
CATCTGTAAT 
AGACCATCCT 
GCATGGTGGC 
GAACCCCGGA 
ACAGAGTGAG 
TCTTACCTGG 
TTGATGGAGC 
TCAGATCCTG 
AGGGACAATC 
ATGGTCGTTT 
TCTGCCATGC 
CACTTACATA 
GAAGACAGCT 
CACACAAAAG 
CAGCATAGCT 



CCACGCCCAG 
GTGTGGGAAG 
CTAGCTGTGT 
AAAATAACAA 
ATAATGGAAC 
CCCAGCACTT 
GGCCAACATG 
GCATGCCTAT 
GGCAGAGGTT 
ACTCCGTCTC 
TCATTTCCTT 
CAACAGAACT 
CCAGATGATC 
TGTGCTTGCT 
ATCTACATGC 
ATCACAGGGG 
CCCCACTCAG 
CTGTGACTGT 
GCATCGCCAA 
CCATCTTTAA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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AAGAGTCTTT 
TTGGTTAGCT 
TTGTCTTTGG 
GAAACAATTT 
ATTCCAGAAG 
AAGGCTTCCT 
TCCAAGGAAA 
TCAAGTCTCC 
GAAGGCAGTT 
GAGAGAGAGA 
TACTCTGTCA 
GCAATGTCTT 
CCAATACAAA 
AGGAATTGGA 
AGAACAACCC 
AGTTTATACT 
GAGTAAGTTA 
TTGACACCTC 
TCAGGGAAGT 
GCTCACGATC 
CTGCCTGACC 
CGAAAACTAC 
CCAGCGGCAG 
CTTCCCACCC 
CTGGGAGGAA 
TGTCCCAGAA 
GTCTGGCTGC 



CCACAAAGAT 
TGACTGCTCC 
TGTGGACTAT 
AAATTTGAGG 
GGGAGGCTGA 
GCAGAGCCCT 
GACTGGCTGC 
CCTTGCACAC 
CTTTGTGACC 
GAGAGAGAGA 
GAAAAGAGGT 
ACTGCCCCCT 
TCATCTTGGT 
CTCACATTGC 
AGTTATGATC 
GCAGTTGGAG 
CCTGTTGATC 
TGTAAGGTCA 
GGAGTGGCTG 
TGTGCCCTGT 
CTGAACCAGC 
GCAGGGATCT 
GCCACCCACC 
ATCTACCCGC 
GCAGTTGCTT 
AGACAGGAAG 
TTTTATTGCC 



15/33 

GGCATCCGCC ATGTGGATGA 
ATCTGATCTT CCTCTCTCTC 
AAGCAAGCTC TGTGAAGTAA 
AAAAGGGGGC ACCTAAGACC 
GAATAAATCA GATGAATATC 
GGGCATAATA ATCTGGGACC 
TTCCAAGGAG GGTAGGGGAG 
TCTCAGGTTG GCATTTTCAC 
AGGGTACACC CCCTATTATA 
GAGAGAAAGA GAGCAAAGTG 
TCAGAGAATA AGAAAACGTC 
ATAGACGGGT TCCAGGGCAG 
GGATGGTTCT CTGAGGCTCA 
AAAGGCACAG GGCAGGGCAG 
ACCTACTGCT CTGTCTCCAT 
GAACTGCCTG CAGCCTTGAG 
ATATTGTCAA GGAATTCCTG 
GATCTGGAAG TAGGAGAGTG 
GCTGGGATTG GGGCTTTTTC 
GTCTGGCTGC AGGGGACTGC 
ACCTTCTTTT CCGAGTCATA 
TCCACTTCCA GGTGAGGTAA 
GCTGGTCTCC TGGCCTTGAC 
AGCGGCAACA GTCGGCATGG 
ATCTCTGGCT CCCTAATCCC 
ACATCCTGTT TACTGTGGGT 
TGCAGCCCTT CTCAAGTAGG 



GCATCCAATT 
GACCTCTTGT 
AATTGGAGAG 
AAAGGAATTT 
TGGGTTCCTG 
TTCAAACCAA 
AGTCGGGCTG 
TTTAACCCAT 
TATATATATA 
TTACCTCCAA 
CCGAGCTCAT 
CTGCCTACCT 
GTCTTCGCTG 
ATTTCCTACA 
TGAGGCCTAA 
GAAAATGTCT 
TCCAATTCTC 
GGCACCAAGG 
TTCCCAGGAG 
TGGTTTCTCG 
CCCCATGATC 
TGAGAGTGTA 
TTCCCAGAAG 
ACCCCCTTAA 
TCCCCCACCA 
CTATTTTTGT 
TCCCTAAGAT 



TTCTCTTTGA 
TCAGAAAGTA 
AACACCAACA 
GGCTTATTTC 
CACCTGAGGG 
TAACCTCTTT 
CAGGCAGCTC 
CCTCCCTTAA 
CACACACAGA 
CTACATACAG 
TCCGTTGCCA 
GGCCTTCCTT 
AAGTGAGAAG 
GGTGTTAGGA 
AAAGGAAGTG 
AGTCACAAGG 
CTTCCCTGGG 
GAGTCCCCGT 
GAGCAGGAGT 
CAGCCATTGC 
AAAGTTTCAT 
GTTAAGAGGG 
CTGGAGGAAA 
GGCTTCAAGC 
CCTTCCACTA 
CTTTGCAGCT 
ATTAGCACTG 



1260 

1320 

1380 
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1500 

1560 

1620 

1680 

1740 
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TGACACCACA 
TTCCTCTCTA 
AGAGTCTGCT 
CTAGAATCCA 
TGCTAGAGCA 
GGGAGTCAGC 
CAGTTCTTGC 
CAGGCTCAGA 
CTAAAGGGCC 
AGTGCCATTG 
CAAAAAAGAG 
AGATTCTGGG 
ATACCTAAAT 
GCTGGGGGCA 
ATGAAAATCT 
ACTCTGTGGA 
CAGTTAGAGA 
AGAGTCCAGG 
GACACATTTC 
CTGGCGCTAT 
TCAACTGGTT 
GGCTTATGCT 
GTTGCAAAAT 
ATGTGTGGGC 
AGCTCTAACT 
AGCTTTAGCT 
GGGATCAGAG 



GGACCCTTCA 
AGGCATGGCG 
TAACCTGGGG 
TCCAGCTACT 
GAACCAAACT 
CTGTCTCCAG 
TCCCCGGGTC 
CTCCCCCTCC 
TTACATACAA 
AAAAGGAGAC 
CGACATCCAA 
TCACTTTGTT 
CAGCACAGTG 
CCTGAGAGTG 
TACATCCTAA 
AGACATGAAG 
CAGATTTACA 
AAATGATGCT 
CTAACAGTAA 
GGAGAGTGGG 
TTCACCAAGT 
AAGTAAGCAA 
CCAGCCGAGA 
ATGCAAGTCC 
AAAAACATTA 
CACCATAGCG 
CATTGTCCCA 



16/33 

GGTTGTACAG GAACCCCTGT 



GTACCAAGGC TATCACTCCT 
ATCAGGCTTC TTGTTTGCCC 
GGAAATTTTC TGGGTCCCAG 
GAATTCTACC TGTGAGGGTC 
CTTCAAAGGC TCCCTCATGT 
TTGCACCTCA GCAGGGAAGG 
TGCCGCCTTG GGAACATGGC 
ATATCAGATA GATTTCTGTT 
TAAACCACAT TTGGCCCTTT 
ACTTGAAATG ATTGAACAAT 
CCTCCGTTTC AATCCTTGTT 
CCTTCACTGC ATAGTTCCCA 
CTGACACCCA GGCCCTGCCC 
GACACTCATG GAGCACCTAC 
TATATGTAAC TCACTTCCAG 
CACCCCAAAC ACAAAATAGG 
GCTTTGGGAT TCAAGAACCC 
TTTGAGTATG TGACTCTGTG 
TGGACGTGGT TATAGATGAC 
CCAACCACCG CAATGAGTTC 
CACTTTAGAA TGTGAGGTGG 
CCTCACTCAC AGGAAGAGGC 
AACTGTGACC CAAAGTTAGA 
AATTTAAGAG TAGAAATGAA 
AGTTCTTTCA TTGCACCTCC 
GGGTCTCGAT TGGCTCAACC 



CCAGGGCTCC 
CTCTTCCAAG 
TAGAACTGAA 
TCACCTTGGC 
TCGTAGCTTC 
CCCAGGATGA 
CCTCAGAAAA 
ATATTTAAAG 
CTCATTTCAA 
TCAGTTCAAA 
GTTCCTGCTA 
CTTCAGTTTG 
ATCCTGGCCA 
CAGACCTGCT 
TCTACCCATT 
CTCTCAAAAA 
ATGAACAGGC 
CCTGAGGAAT 
CGTGACGCTT 
TGCCTGCCAA 
TGGAGTGCTC 
GGCTAGAGGT 
ATGTGCCTCT 
GATCAGTTCC 
GATTTGCATA 
ATGGTGGCAT 
TCATGTGCTT 



TGTATACTTC 
CCCTGGAAGA 
TCTGATGGTT 
ATAGAGCTGG 
CGGGATGCTG 
CCCACATTAT 
GGTCTGTCTC 
GGTCTCAGAT 
TGAGGGAGAA 
CTGATTCATT 
CAGCTAGAAT 
GCATCAAGAA 
CATTGAATCA 
GAGCAGGAGA 
ACTGGGCTGG 
GCACCCAGTC 
ACCCAGATGC 
GTGGAGGAAG 
CTGTGCAGTT 
CGTACAACAA 
TGCTGGAGAA 
GAGAAAGTGG 
ATACGTGCAT 
AGGCAACAAC 
GAAGACCTTT 
TGCAAGTCTT 
ATAGAAGATT 
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17/33 

TATAAAGACA TGTTGTCTCT CAACTTAAAA GCTCCACCCC AGATGATAAT AATGGATTTT 4500 
CAAATTTTGG AACAAGGTCA CTCTGTAATG CAGGCTGGAG TGCAGTGGTG CAGTCACGGA 4560 
TCACTGTAGA TTGACCTCCT GGGTTCAAGG TGCTCCTCCC ACCTCAGCCT CCCAAGTAGC 4620 
TGGGACTACA TGCGGGCATG ACCATGGCCC TTTTATTTTT GTATTTTTTT GTAGAGCGGG 4680 
GTTTTCCCAT GTTGACCCAG ACTGTTCTCG AACTCTTGGG CTCATACAAT CCACCAGCGT 4740 
TGCCCTCCCG AAGCGCTGGG ATTGCCGGTG TGAGGCACCA CACCGGCAGC TGCTAATGGC 4800 
TTTAATGGAG CGCTTCCTCA AGGTTCAGGA TGTAGTGGAA AGAGCTGTGA GGAAGTGGGG 4860 
ATAGCTGGGT TTCAATCCGA GTGCTTGTGG CTCTCTGTGG TGTTGGGTGG GTCACTTAGC 4920 
CTCTTGAGGT CAGTTTCTTC ATTATGAAGA AAGGGAATCA TTGTTTCCAT CCCATGAGCT 4980 
CATAGGGTTA ATGTGGAATT GATGAAAGAA CATCACAGCA TCCAAGAGGT AAAGTTCTGG 5040 
TGGCAGTGGT ACCTGGGTTT TGTTCCCTGG AACTCTGTGA CCCCAAATTG GTCTTCATCC 5100 
TCTCTCTAAG GCTCCATGGT TCCTACGAAG CTCTGAAAGG TGGGAACACG AGAGAGGGCA 5160 
TGGAGGACTT CACAGGAGGG GTGGCAGAGT TTTTTGAGAT CAGGGATGCT CCTAGTGACA 5220 

TGTACAAGAT CATGAAGAAA GCCATCGAGA GAGGCTCCCT CATGGGCTGC TCCATTGATG 5280 

TAAGTCTGGG GTGTGGGGCA CAGGGTGGGG AGCTCCAAGT GTGAGGAAGC CTTTTACCGA 5340 

ATGAAGGGCA GCATAGAGCT TTTGTGTGGG ACAGAGCGAA TGTTTTGTTT GAGGAAGCAG 5400 

GAACTGGCTC TCAACTTTGA GGACTGGGAA TTTCTCAAGG GAGAACAGTT CTTCCGGATT 5460 

TTCAATAAAG ACACTGGTCA AGGACATTTG AAGCCCTGGA ATGTCAGTGG AAATCAGTCC 5520 

AGAGGCGTGT GTCAGTGGAG GCCTCCCTTG CTGGTGCTCC TCAGTCTCAG CACGCTCCCA 5580 

TTAAGCTGGC CACGTACTTG GCTGTGGACC TGAGCCCACC ATTTCCCTAA GAAAGCCTCC 5640 

CAGTCACTGG GCTTTCACCA CACCTCCCCG CTTGAGACGT GGGCTTTGTG TTGTTACCTG 5700 

GGAGAAGCTA AGGCTGCAGG ACCTTTCAGT GCAAAGAAAT GCTGTGAACT GAGACAGGAG 5760 

CCAAGGGTAG GGAGATGGCC GCCCATGGCC AGGCCTCCTT CAGGGGGCAT GCCTTCCCTG 5820 

AGGGCTGCTC AGTATATTGA TATGATAATC TTAGTGGTTT CCATTGGGGA GGATGGGGCT 5880 

GAAGCTGAAT TCCTGCCCCT TCTTCTCCCA ACACGCCCAA TGGACAGCTT GGAAGGTCAG 5940 

TTAGCACACA ACACCATGGA TGAACTTTTT TTCTGTATCA CTTTTCTCCG TCTTTCCTCC 6000 

ATTCGTGCTC TGTTGATCTC TCCTCTCTCC CTTTGTCTGT CCCATCTCTT TCTCCTCTCT 6060 
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CCTTCCCTTT 
TCCATCGGGC 
ATGGGGGAGT 
GACCTCGACC 
AGAACTGACC 
CAGCCAGGGC 
ACTTGCTCAG 
GCTGGAGTTT 
GAAATACTTG 
TTTATTCATT 
CAAAGCAAAT 
GAGTTTATAT 
CAGAAAAACA 
AACATCTAGT 
TTTTAAAATG 
TTTGGTTTTT 
GAGCACAGTT 
TCCTGCCTCA 
TTTTTGTATT 
CTGACCTCAA 
CACCACACCC 
TTTAATAGCT 
TATTATTGGA 
AAATGCTATA 
AGGACAGATT 
AGAGTCATTT 
CCCCTCCCAG 



CCACCCTTCT 
CTCAGGATGG 
TGATTGCACG 
CCAGAGGCTC 
ATCCCTCCAA 
CTTACCCACA 
CCAAGGCTCC 
CTGCATATTC 
TAAAGATACT 
CAACACTTAT 
TCTCTCCTCT 
TCTAGTATTT 
CAGAGGAAAA 
ATATGTTCTT 
TTATCATATT 
TGGTTTTTTT 
GTTGCCATCT 
GCCTCCCGAC 
TTTTAGTAGA 
GTGATCCACC 
GGCCTAGTTT 
ACACAATATT 
AAGTTGAGTT 
ACGAACATCC 
CCCAGCAGTA 
CAAGCAGCTT 
TCTATTCAGC 



18/33 

GTGTTTGTTC TCTCCCTCCC 
CACGAACATG ACCTATGGAA 
GATGGTAAGG AATATGGATA 
AGATGAAAGA CCGACCCGGG 
CCCACATGAC CCCGCCCTAT 
CACCCCCACC TGGCACCTCC 
TGAAGAGGGT GCAAGAACCA 
CATGGTCCAG GCAGTTCCTC 
TCATTTATTT TGAAATATTT 
TTTTGAGCTC CTACTATGTT 
TTTTCAATAT TTGTGGAAAA 
TCATAAGTTA TACCTGCTCA 
TTTCACTTAT ATTTTTCCCC 
CCAGGATTTT TCTATGCACA 
GTATGTACCT CTTTGCAGCC 
TTTTTTTTGG AAACCAAGTC 
CGGCTCACTG CAACCTCTGC 
ATAGCTGGGA TTACAGGCAC 
GACGGGGTTT CACCATGTTG 
TGCCTCAGCC TCCCAAAGTG 
GATATTCTTA ATGTGCCCAA 
CAAACACACA GATATGTTAT 
CTTTTTTTTC TTTGTTTTGT 
CAATAGATAC ATCTTTGTAT 
GAATTGCTGG GTTGAATGAT 
CCTAGGGTCT TAGAACTTAA 
ATGATCTGGA TCATGAGGAC 



CTGTGTTGTT 
CCTCTCCTTC 
ACTCACTGCT 
TGTGTACACC 
TAGTGTCAGA 
CAAGGGTCTG 
GGATTTTGGA 
TCATAACGAA 
TTCCTCTTCT 
CCAGGCACTC 
AGCAAGGTCT 
CTGGAGAATA 
ATGTAAAGAT 
CACTGAATCT 
TGCTTTTTTC 
TTGCTCTATT 
CTCCAAAGTT 
ACACCACCAC 
GCTGGAATGG 
CTGGGATTAC 
AGTATTCTCC 
AATTTATTTA 
TTTGTTTTGC 
ACATCCATGG 
ATGCTTAGGG 
GGATTAATGA 
TGAGATCTGG 



CCCTACATTC 
TGGTCTGAAC 
CCAGGACTCA 
TCCGATTATC 
CTCCCCTCAG 
GGTTGAAATA 
GGGAATCTCT 
CTATCAGACA 
AATGTATTCA 
CTCTAGCAAA 
CCCTCTTGTA 
CTGAGCCATA 
AACCACTCTT 
GTATTTTTAT 
AGTTAGTTTT 
CCCTAGGCTG 
AAACTAATTC 
ACATGGCTAA 
TCTTGAACTC 
AAGTGTAAGC 
TGTAACATTT 
CCCAATACCC 
TACTATTCTA 
TGACTTCCAT 
TAATGACAGA 
GTCTTCCCGC 
AAGAGACTGA 
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6300 
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GATCTGGGAG 
ACAGCTCTAG 
GTTAGTAGGC 
CCCTTCTCCC 
CTCCTTAGGC 
CTGGGCACTG 
AGGGGATGTA 
ACTTTGGGAG 
AACATGGTGA 
TGCCTGTAAT 
AGAGATTGCA 
TCTGTCTCAA 
TTAACCCCTT 
GGGGAAGGGG 
GATACTAAGG 
CCACATTGCT 
TGTGTCTCTT 
TTCAGGGAGC 
TGCCTCCAAG 
TACAGACAAT 
ACGCCTACTC 
CACCCTCCTG 
GCAGAGCTTG 
GGGGCCTGTG 
TTTTCGTGCT 
TAACTTTGCT 
AGATAGCTAC 



AGGCTGAGAT 
GAATTCCGCG 
ACATGGACCA 
AAGGATCCCC 
ACGGTCATGT 
GGACTCTTGA 
AAGAAGGCTA 
GCTGAGGCGG 
AACCCCGTTT 
CCCAGCTACT 
GTGACCCGAG 
AAAACAAAAA 
TTCTCAAACC 
GATCCTGAAG 
GGTCCAGAAA 
CTCTGATGGT 
GGCACTCTTA 
AGAGTGGTCT 
CAGCAGAACT 
CATTCCGGTT 
TGTCACGGGG 
GGTTAACCTC 
CCTCCAATCA 
AAACTGGTAG 
AAAGATGGCA 
TTTACAAAGT 
AGTTGTCTCT 



19/33 

ACCAAAAGCC CTGGCTCCAC CCATACCCCT CGCCCTGAAA 7740 

GCCTAGCAAG GCTCCGGGAA GCTCCTTTTA AAGCTGTGAC 7800 

TAGAGACCTA TCCAGGGCTC ATGGGACTTT AGTGATCCTG 7860 

CATGGCTGCA ACTTGGAAAT TTCTGCAAAT GGAAGAGCTA 7920 

CTGAGCAGGG ATCTCCTCGG GCTTTCTTAG AATTCTCTCC 7980 

TTTCTTGAAT ATTATGTTCC AGGTGGGTGT GGAGGAGGTG 8040 

GACTTGGCCA GGCGCAGTGG CTCATGCCTG TAATCCCAGC 8100 

GTGGATCACC TGAGGTCAGG AGTTCGAGAC CAGCCTGGCT 8160 

CTACTAAAAA TACAAAAAAT TAGCTGAGCA TGGTGGCACG 8220 

CGGGAGGCTG AGGCAGGAGT ATCGCTGGAA CACGGGAGGC 8280 

ATCGCGCCAC TGCACTCCAG CCTGGGCGAC ACAGCAAGAC 8340 

AGAAAGAAAA AAAGGAAAAG CTAAGACTTA CATGTGTCAC 8400 

TCTTTCTCTT CCAGGAATAG TCAACCCCTG GATGGCTTCA 8460 

CCCAGGGCAG CCTCCAACTC TACCCCTTCC TCCTTTGAAG 8520 

GGAGGGGCAG GACACTGTTA CCCACCCCAC ATCCCAGCAT 8580 

CAGGACAGAG CCTTCTCAGG GAGACCAGCC TGTCTGGAGC 8640 

AAGGGCCACT GAAGGTCCGT TCGTGGTCGT GAGGCACACT 8700 

GTGTCTTCAC AGAGCCCGGA AAATGAACTA GTATGAACTT 8760 

TCTGTTCCCC CGCCCCTAAT GGGTTCTCTG GTTACTGCTC 8820 

CAGTATGAGA CAAGAATGGC CTGCGGGCTG GTCAGAGGTC 8880 

CTGGATGAGG TAAGCCTGGT GGGGCTTGGT GGGGCAAGGG 8940 

ATGAAGTCAG GACTTAGCTG TTGGGGCCCC TGCCCTGTCT 9000 

GGACATTCAG TTCAAGGTCC AAGCCACGCC TGGGAGCAGA 9060 

AGGTGGATCC TGCCACAGTT GGTGCACAGT TTATCTTTGC 9120 

ATTTTTCCAA CATTTCCAAT GAACAAATTG AAATATCACT 9180 

TGGTTTCATG TGTTCTTGAG CTTCCTGTTC TCTCGTGTTC 9240 

GGGTAGCCAC GGGGACTGGT TCCAGAAGCC CCAACAGTAA 9300 
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CAAAATCTGC 
TGCACATCCT 
GAAATGCTAT 
TTGTATTATT 
GCCATGGATA 
TGGACCATGT 
CCCTGTACAG 
GGAGGGAGTC 
TTCCTTTTTT 
GCACAGTCTC 
CCTCCTGAGT 
TAGTAGAGAT 
TCCGCCCACC 
AGCAGGGGCC 
CTCCTGTTTA 
TTGCCTCTTA 
GGTCTGACCC 
CAGGCAGAAC 
ACAGAAGATT 
GTCCCGTTCA 
GAGTGGAACG 
GGGGAACAGG 
CTTGGTATAA 
CTCCAGGGAA 
CCATTGGCTG 
ATGAATGCAG 
TCCTGTGAAA 



AGATGCTCAA 
CCCATATACT 
GTAAATAGTT 
ATTTTTTCTT 
CGAGAGGCTG 
CTGAGACAGG 
AGGGATGGGC 
TGCTTGTTTG 
TTTTTTTTTG 
GGCTCACTGC 
AGCTGGGATT 
GGGGGTTTCT 
TCGGCCTCTC 
TTTTTTCTAA 
TGCCTCACCT 
GAAGATAGAG 
ACCCCCTGCC 
ACCCTCGCGT 
CCCTTTCCAG 
AAGGTGAGAA 
GTTCTTGGAG 
GTCCGGGACA 
AATCACCCTC 
GGGCCAGGAG 
GAAAGGAAGG 
GGTTCTGGGC 
TGGGAACAGT 



20/33 

GTCCCTTCTG TAAAATGGAG 
TTAAGTCATC TCTGGATTAC 
ATTGCACTGC ATTGGGTTTT 
TTTTTGAATA TTTTTGATCC 
ACTGTTCTGT TTTGCTCCTT 
AACGTTGTAA GACCTGTTGC 
TGAGAGGGGC AGTTGCCTGC 
TAGTTCCTCA GTCAGCAGGG 
AGACGGAGTC TCACTCTGTT 
AATGTCCGCC TCCTGGATTC 
ACAGGCGCGT GTCACCATGC 
CCATGTTGAT CAGGCTGGTC 
AAAGTGCTGG GATTACAGGC 
TTTATATGAA GACACCTAAT 
CCTCGCCCGA AGCTCATACG 
AGGAGATGCC AAGCCTAAGT 
ATTCCCCAGC ACACTTGTGA 
AAGAGATTTG CCCCCCAGCC 
AGAGGCTGCA GAGCATGAGA 
AGTGAAGCTG GTGCGGCTGC 
TGATAGGTAG GTGAGGGGAC 
AGGCTGTGTT GGGAACTGAG 
AAAACCAATG ATCCGCAGAG 
TGGAAGCGGG GTGCTGGGGA 
ATTCCAGAAA GCGTGGGGAA 
TAGAGAAGTG ACTTCCCTTC 
ATTATTAGCA CTTACCTTGT 



TAGTATTTGC 
TTACGATACC 
TTTGGTATTA 
ACAATTGGTT 
CTGGGACTTC 
ACACAGTTGG 
ATCACCCATT 
GCCTTTTGTC 
GCCCAGGCTG 
AAGCGATTTT 
CCAGCTAATT 
TCGAACTCCT 
GTGAGCCACC 
TTATATGTGT 
GCAGGATGTT 
TAGGCAGACT 
TTAATCTCCT 
CCGTCCCAGC 
GCTCTTTCTG 
GGAATCCGTG 
CCCACGGGAT 
CCATGAGAGT 
AAGAGGGGCA 
CCCAGAGAGG 
GGTCCAGGCA 
TTGGGGTCTT 
GGGCTGATAT 



ATATAACCTA 
TAACACAATG 
TTTTCTGTTG 
ATATGCCAAA 
TGGGTTTTCC 
GCAGGTTGTG 
GCAGCAGACT 
TTTCCTTCCT 
GAGTGTAGTG 
CCTGCCTCAG 
TTTGTATTTT 
GACCTCGTGA 
ACGCCTGGCC 
TAGCAAAGCC 
CCTGAGAAAA 
CAGGAGGATA 
TGGCCAGAGC 
CCTCAGCTAG 
TGTGCTTAAG 
GGGCCAGGTG 
TGGCGGTGGC 
ATTGAAGATG 
CAGGTGTTGG 
TTGCTGACAA 
GGAAAAGCGT 
GTGTTGCCTT 
TGAGGAGTAA 
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CTGGGACTTG 
TCCCTTGTAT 
AGGGATTAAG 
CCATGCACAC 
AGAGGCAATT 
GACTCAGTCG 
TTTACTGCTG 
AGTAAATGTT 
CTTTCTACCC 



TTTTTGGGCA 
TAAGGCACAA 
ACCTTGGGGG 
TTCGTAAAAC 
CATTACTGAA 
AGGAGAAGGA 
GGTACTTCCT 
TCTTGGGCAC 
ACCCCCTCCC 



21/33 

AGTGCTGACC CATTGCTAAG 
GGGCCCTTTG AAAAGAATTT 
CCAACCCAAA ATAAACATGC 
CTCCATGGTC CTACTGGTTC 
TGAGCCATAA GCGCCTCTTA 
CCGCACCCAG GCAGCCTGGG 
AGCCCAGCAT GTAATTACTG 
CTACTACATA GGAGGCACAG 
TCCCTACACT GTGATTAGGG 



ATTCCCCTTA 
TACCTGCTTT 
GAACTTATTA 
CTGATTACCT 
TTTCGAGAGG 
CCCCTCGGCT 
GTTCGTTCAG 
GTCAAGGCAC 
ACTGACCGAT 



CCCGTGCTTG 

ATCAATTGAA 

TTTATAGGCT 

CCACTCAATG 

GGGATGGCAG 

CCTGTACTTA 

TCATTCGTTT 

TGGGGATATT 
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11160 
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11400 

11451 



FIG . 8B/8 

SUBSTITUTE SHEET (RULE 26) 



wo 96/16175 PCT/EP95/04575 



22/33 

(2) INFORMATION POUR LA SEQ .ID NO: 3: 

(i) CARACTERISTIQUES DE 1^ SEQUENCE* 

iS^^^EUR- 1834 paires de bases 
(B; TYPE: acide nucleique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID 
ATTTTTTTTT TTTTTnTGA GACGGAGTCT CACTCTGCCA 
CGCGATCTTG GCTCACTGCA ACCTCCGCCT CCCGGGTTCA 
CTCCTGAGTA GCTGAGACTA TAGGTGCCCG CCACCACGCC 
ATTAGGACGG GGTTTCACCA TATTGGCCAG GCTGGTCTCG 
GCCCACCTCG GCCTCCCAAA GTGCTGGGAT TACAGGTGTG 
AACTCAATTC TTAACCTTTA AAGTATGATG AGAAGAAGGA 
TTAAGGAGTT TAGGCTCAGT CTTGAGGATG TGAGAAGTCA 
GAGGTTAACA GGTGAAGTCA GCATTTTGGT AGTTCACAGC 
CTCTGATACC TCCTGTCCCA ACCTACATCA GGCCTTCCCT 
CTCCATTTTC CCACCAGATG GAAGGACTGG AGCTTTGTGG 
CTGCAGCACC AGGTCACTGA GGATGGAGAG TTCTGGTGAG 
AGAAGGGTAA GGGTGGGGAA GAGAGGGGAA ATCTCAGACC 
TCAGATTCCA GCCCTTGGGA GATCTTGGCT GTGTTCTCCT 
GATGAGGTTC TGAGAGGAGC CTTCCAGGCC ACAGGGACAA 
CATGACATGG CTCTTGCCTC CTGTGTGCCC CTCCGCCACA 
CACCCTGGCC TTAGCACAAT TCTTTTCTGA GCCTAGGAAG 
CAACGTCAAC CTCACCCTCT CTCAGGTTGT TTCTATTCAG 
GAGAATTTTC AAGTCTCAGC TTAAGGAGAG CCCCCTAAGT 
TTTATGATGC TCATCACCCT TAAAATTGTT TGCTTAAGCC 
GTAATCCCAG CACTTTGGGA GGCCGAGGTG AACGGATCAC 



NO: 3: 
CCCAGGCTGG 
AGTGATTCTT 
CAGCTAATTT 
AAATCCTGAC 
AGCCATTGCG 
TCAAGCCCTC 
TTGCTATTGG 
AGGTGCAACT 
TCTTCCTGCT 
ACAAAGATGA 
TCCAGAACCC 
TCAGTCCCCA 
CCAGCCCAAG 
TGAGCCCAGG 
CACTCTATTC 
CTCCACTTAC 
GCTTCAAGTC 
TCCCCGAGGA 
GGGCGCGGTG 
GAGGTCAGGA 



AGTGCAATGG 
CTGCCTTAGC 
TTGTATTTTT 
CTTGTGATCC 
AGCAGCCCAG 
ACCAGCCCAT 
GTTTCACACT 
CTTTGTATTT 
TCCTTAATTC 
GAAGGCCCGT 
AGGAAGACCC 
GCTAAGGTTA 
GCCCAGCAAG 
ACCAGGCCAA 
CAGCGACAGG 
CCTGATCTTC 
TCAGCTTAAG 
CTGGGATTAA 
GCTCACGCCT 
GATCGAGAAC 
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ATCTTCGCTA ACACGGTCAA ACCCTCTCTC TACTAAAAAT ACACAAAAAA ACTACCCCCC 
CCTCCCACCC TCCCCCTCTA CTCCTACCTC CTGCCGACCC TGAGGCAGGA GAATCACTTG 
AACCTGGGAG GCAGAGGTTA CAGTGAGCCC AGATTGCGCC ACTGCACTCC AGCGTGGGCG 
ACAAGAGAGA GTGTGTCTTG GAAAAAAAAA AAAAAATGTG GTCTTAGTTT AATGTCAAGG 
GAAAGGTTTT GGGTCTTTTT ATTACTTTAT TTTTTATTTA AAAACTATAA TAGAGACGGG 
CCTCGCTATA TTTGTCGGGC TGGTCTCAAA CTCCTGGGCT CAAGCGGTCC TCCCACCTTG 
GCCTCCCAAA ATGCTGGCAT GTGGGCCTGG TCAACATATG GGACCCCAAC TCTACAAAAA 
ATITTAAAAT TAGCCAGATG TGGTGGGGTG TGGGTGTAGT GGCAGCTAGT TGGGAGGCTG 
AAGCAGGGGG TGAGTTGAGC CCAGGAGGTT GAGGCTGGAG TGAACTATGA ITGTCGTTCA 
CTTTTGTTGT GAACGTGAGA TTAAGTGTAG TCAGCAATTT GGGTTAGGAT TATTTATTGA 
GAATTTTTAA CCGTCACGTT GCGGCAAACC AGGT 
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(2) INFORMATION POUR LA SEQ ID NO: 4: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 14664 paires de ba 

(B) TYPE: 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID 
AGGAGGTGGA GGTTGCAGTG AGCCAAGATC ATGCCACTGC 
GCGAGACTCT GTCTCAAAAA ATACACACAC ACACACACAC 
ACACACATAT ATATACACAC ATATATATAC ACACACATAT 
ATATATGTGT GTGTGTATAT ATACACACAC ACACTATTCT 
GTGTGTCTCC TGTGCTATTG AGCATGAGCC CTTTTTTTTT 
GTCTCACTTT GTCGCCCAGG CTGGCATACA ATGGCGCAAT 
GCCTCCTGGG TTCAAGTGAT TCTCCTGCCT CAGCCTCCCA 
CCCGCCATAA TGCTCAGCTA ATTTTTGTAT TTTCAGTAGA 
GCCAAGCTGG TCTCAAACTC CTAGCCTCAG GTGATCCACC 
CTGGGATTAC AGGCATGAGC CACAGCACCC TGGTGAGCAC 
TAACTGTATT TTTGTATCCA TTAGCCACCC TCTTTTCATC 
CAGCCTCTGG TAACCACTGT CTGCTCTCTA CTTCCATGAC 
CACATATGAG TGAGAGCATG CGACATTTAT CTTTCTGGCC 
TGTTAGAAAA GATGATGGTT TGGAGTAGAT ACATCAGAAG 
AGGAAAGACA GGCTCCTCTG GGACCCTGAC CAAGTTCCTG 
CTGTGTTAGT CCTGGGGTCT TCCGTTCCCA GCCCTCCTCA 
TCTCTTCTTC CAACCTCTCA GGATGTCCTA TGAGGATTTC 
GGAGATCTGC AACCTCACGG CCGATGCTCT GCAGTCTGAC 
GTCTGTGAAC GAGGGCCGCT GGGl:^CGGGG TTGCTCTGCC 
AGGTGGGAGA TGCTCTTGAT GGGGGGhGGG TCTAAGCCGA 



NO: 4: 
ACTCTAGCCT 
ACACACACAC 
ACACACACAC 
ATATATTCTT 
TTlllTTnT 
ATCGGCTCAC 
AGTAACTAGG 
GATGGGGTTT 
TGCCTCAGCC 
TAGAGCTTAT 
CTCCCCTCTC 
ATATGCTTTG 
CTGGCACATT 
TGACAGCGTT 
TGAACTATTT 
CCTGCTCCCA 
ATCTACCATT 
AAGCTTCAGA 
GGAGGCTGCC 
AAAAGTTCCA 



GGGCAACAGA 
ACACACACAC 
ACGTCTGTAT 
GTAGAGCTAT 
TTGAGACAGA 
TGCAACCTCC 
ATTACAAGTG 
CACCATGTTG 
TCCCAAAGTG 
TTCTTCTATC 
CTTCCCTTCC 
TTTTAGCTCT 
TTTGAATCAT 
TGCCCTAAAA 
TATTATTGTG 
TATGGCTCTC 
TCACAAAGTT 
CCTGGACAGT 
GCAACTTCCC 
GGCAGAAGAA 
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GCCTAACTAG 
CCTGTTTTAC 
AACTTTGGAA 
TATGTCCAGA 
TGCCTAACCC 
CTTGACCTTA 
AAGTTGTTGT 
AGAATGGTTG 
GTAAAGTTAA 
TAACAGTAAG 
GATGATGAAA 
GGGAGAGCAT 
GGACCTGACA 
TAATCCCAGC 
CAGCACAGGC 
GAGGATTGCT 
CCTGCCTGGG 
ATCAGCGTGT 
CTTGGGACAT 
CATCGGATGC 
GGAATAAAGG 
GAGGCTGAGG 
TGAAAACCCA 
ATCCCAGCTA 
CAGTGAGCCG 
AAAAAAAAAA 
TAGGGGGCAG 



TGCTTATTAA 
TGAGAAGGAA 
GCAGGAACTT 
TAAGCCCATC 
CCCAAACCTC 
AGCCTAAAGT 
AATATCTCCA 
TACGGGTACT 
AAAATTGTAG 
GGCACTATTG 
ACCTAGACCA 
ATTGGGTGAT 
ACATTGCAAC 
ACTTTGGGAG 
AACATAGTGA 
TGAGCTCGGG 
TGACAGAGTG 
TGTTTGTTTT 
GGAAAGTTTG 
GCATATTAGA 
AAAGAAGAGG 
CAGGCGGATC 
TCTCTACTAA 
CTTGGGAGGC 
AGATCGCGCC 
AAGTGAGAGA 
TTAAAAAGCA 



25/33 

GTCTCTCTGT TCCAGACGTC CACTATCTTA TTAAACCTTC 



ACCACCATGC TGAGAAGTTT GCAATAGGGA GCTGGGTAGC 
GTGGGAACAA TGCAGATGCT GCTTGGACTT ACGATGAGGT 
CATCTTTTGA AAATACCCTA AGTGAAAAGT GCATCCAATA 
ATAGCTTACC CTGGCCTACC CTCAAACATT GCTCGGAACC 
TGGGCCAAAT CATCTAACTC CAAAGCCTAT TTTACAAAGA 
TGTAACTTAC TTAATACTTG TACCTAAAAA GTGAAAAACA 
CGAAATCCAG TTTCTACTGA ATGTGCATCT CTTTCACATT 
CCGAACCATC CTAAGTCAGG GACTGTGAGT ACTGTGTCAG 
GAGAACCAAG TTAGCAGCTG CTGCAATAGT TCAAGTCAGA 
AGTCAGTAGC AGCAGAGATG GAGGGGAGAC AGCAGATTTA 
GTAGGGAAGG AAGAAGAATG ATGTCAAGAT TCCCAGTTGG 
ATAAGACACA CAAGAAGATC GGGTGGGTGG CTCATGCCTA 
GCAGAGCCAG GAGGATCACT TGAGCCCAGG AGTTCAAGAC 
CACCTCATCG TTACCCAAAA TAAAAAAAAA AATGAGGTGG 
AGGTTGAGGC TACAATAAAC TGTGATCATG CCACTGCACT 
AGACCCTGCC TCAAAAAAAA AAGACACACA AGAGAAAAAT 
TGGTGGAGTT AATTGTGGGG TTCTAGGGAA AGGAATTTAG 
AGGTTCCTGT AGAGTGTCCC AGTGAAGATT TGTAATAGAG 
TGGCACTTGG TGATATGATA AGAACTCAAA AAATATTTGA 
CCAGACGTGG TGGCTTATGC CTGTAATCCC AGCACTTTGG 
ACTTGTGGTC AGGAGTTCGA GACCAGCTTG GCTAACATGG 
AGATACAAAA ATTAACCGGG GATGATGGTG GGTGCCTGTA 
TCAGTCAGAA GAATCGCTTG AACCCAGGAG GCGGAGGCTG 
ACTGCACTCT AGCCTGGGCA ACAGAGCCAG ACTCCGTCTC 
GATTGAGGCT GGGATATATG GCTCAGGCAT CATGCGCGTG 
GAAGTAAGAA AGATTGCCTA GGGAGGCAGG AAGGGTGAGG 
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TGAGAGGAGA 
AGGAAAACAA 
CACATACAAT 
TAGAGAAATG 
AGAAATAAAT 
GCGTGCTCCC 
CTCCTGGAGG 
CTGATGCAGA 
TTCGCCATCT 
GAGAGGACGC 
CAGAGCTCCT 
ATTAGAGAGG 
GCGGGCTGCA 
CATCTTCCTT 
CAGTTGTGTG 
CCTGGTGGGG 
CACTTGCAGA 
TGCACTTTCT 
TCCTCCACGC 
CTTGGGATGG 
GTCCTTCCAG 
TGTGGGGATG 
GGTAGGGAGC 
GGTGACCAGG 
CCCGAGGGGC 
ACCTGCAGAA 
ACATGCGGGA 



AGAGGCCCAG 
AACCATCAGC 
CAGTACAGCT 
CCTGATTCGG 
GGTTCCCTCT 
TTTCTGGGGG 
AGGACGATGA 
AGAACCGGCG 
ACGAGGTGTG 
TTCCAGGGGC 
GGTATCAGGA 
CAGTGGAGCG 
GTTGCTGGCA 
TCTGTTTCTT 
CAGCACTACC 
TTTGTGGGCA 
GCAGGTTGCC 
CCCTCGCACC 
TTACAGCCAC 
AGGAATCACT 
CCTGAGGGGC 
ACACATTTCC 
CTATTTAACC 
GAGTTGGGAA 
TCATGTGCCC 
GGACTTCTTC 
GGTGTCCCAG 



26/33 

GACCAGATTC TAGTCACCAA 



AAAGACTGAG AATGAAAGCC 
CCATCTGAAT AAAGG7AGCG 
TTTTCTGTGG ATTTTTCCTA 
GTCTCATCCC CTCCCTGCCC 
TGCAGATACT TTCTGGACCA 
CCCTGATGAC TCGGAGGTGA 
GAAGGACCGG AAGCTAGGGG 
TAGTCCTGAT TGGCTCCAGC 
TTCTAGAGGG GCCCTCTGCT 
CCACTTGTGT TTGTAACAAG 
GGCCTGGCAG AACAGGTGCC 
TTGCCTTCCG CAGGCTCCTC 
CTCAAGGTTC CCAAAGAGGT 
CAGGGGGGCC CGAGTCTGTC 
GGACTTGTGA TAGGAGAGGG 
TCAGGGCATT GCATGACCCA 
AGACACTGCA CGTCACACAC 
ACACACAGTC ACACAGACGC 
TCCCTCAGAA CCCAGCCAAG 
TTCGGAGCTG AGGACAGCTG 
ATTCACTCTG AATCACAACA 
CTTGGGAGTC GGGAAGTAGG 
GGGACCCTTG GAGGTGGCTG 
TGGGCTCTCC CCATCTCTCA 
CTGTACAACG CCTCCAAGGC 
CGCTTCCGCC TGCCTCCCAG 



CAGCGTTTAA 
CAGAGAGGAA 
CCCCCCCCCC 
AGAACCTAGA 
TCTGAGAGGA 
ACCCTCAGTA 
TTTGCAGCTT 
CCAGTCTCTT 
CCAGGAAACA 
TCCTCAATAC 
CAAAAAATAC 
TGGGGGTCAG 
ATCCTCATTC 
ATAGCAGCAG 
TGTGGCTCGT 
CCTTGCCTGT 
TGACTACCAC 
ATGCCTTTGC 
GTTCTGAGGG 
TCCTCTAGGC 
TTCTGGTAAG 
GAAAAGGGAA 
GAGGTTGAAA 
TGGCAGGACA 
GATGCACGGG 
CAGGAGCAAA 
CGAGTACGTC 



GGGGCAGGTA 
GGAAAAGCCA 
CCCAAATCAT 
TGTGGGGAAT 
AGCTGTGATT 
CCGTCCGAAG 
CCTGGTGGCC 
CACCATTGCC 
TACTTTCCCA 
CAGTGACCCA 
CAGGGGGGGC 
GCTTCCGCAT 
ACATCTGAAG 
CAGCGGCCAG 
CGAGAAGCTT 
TGTTATTTCC 
CCCCAGGATG 
ACACTCACCC 
TGGCTGCCCG 
CTCCTTGGGG 
TGTCCCTGAG 
GAGGAATTGA 
CTGTGACATG 
GGACGTTCCT 
AACAiAGCAGC 
ACCTACATCA 
ATCGTGCCCT 
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CCACCTACGA 
ACCTCTCTGA 
ACTCCAGAGG 
TCCTGAGCCA 
GAGCACCTAC 
AGAAAGAAAT 
AAAAACATAT 
AGAGGGATAG 
CATCCTGTCC 
TGGAAGATGC 
AATACTAACA 
TGGTTCCTGG 
AGGAAGCCGT 
AGACCCCACA 
CAGTGGTGAG 
TGCAGGGGAC 
TGTGCTGAGC 
GGCCTTAAGC 
CTGGAAAGGA 
GAGCCGCAGC 
AGAAAGTAAG 
TGACCTGGAC 
ATAATGTGCC 
CTGGCACAGC 
CTTTTCACAT 
CCCTGGTGTT 
CAGAGCTCAG 



GCCCCACCAG 
GTGAGTGCTG 
TTGAAGGCAT 
CTGGCCACAT 
TATGTTCCAG 
CCCTGCCTTC 
GTAGCATGTT 
GAGGTGTTGG 
CTGGCCACCA 
ACCAGGTCCA 
AGAACCTGCG 
GGTTGGGGTG 
GCACCAGAGC 
TGTCTGTATT 
TGGTTTAGAT 
AGATGGTGCA 
AGTCCCTCCT 
ACCGGCGGCC 
GAAGCAATTT 
GAACACTGGA 
CTGGTGCCGG 
AATTTCTGTA 
TTGCAAGGCT 
ATGGGCACTC 
GTGTCATCGC 
CAAGAAAGGA 
TAAGTGGCAG 



27/33 

GAGGGGGAAT TCATCCTCCG 



GCCCAGCTTT CCCACGTGTT 
GAGGCAGCTA GACACGTCTC 
TACCCCCATT CATTCATTCA 
GCACTGTCCT AGGCACTAAG 
ATGGAGCTTA ATATTCTAAC 
AGATTTGGAG AGGTGATATG 
GGATGCTTGA AATTTTAGGT 
CAGATGAGCT CATAGCCCCT 
TGGGTAGGTG GCTGGGTCAT 
TGCCTGGGCT TGGCTGTCGG 
TTCCAGGGGT TCTCTAGAGG 
AAACCGTCCA CGGGCCTCCT 
CCTCACAGGG AAGTTGAAAA 
CTTCTGTGCG AAAAGTCCAG 
GGGGAGAATG GGCACTGGCA 
TGGCACTGCA AATCCTACTT 
ATTGAGGCAG TTCAGGGGCT 
GAACAATCGG AGGGAACAAG 
TTCTGAGACT GGATAACATT 
ACCTGGTGTT GACACTTGGA 
ATCCCTCTCA CTCAGTTTCC 
TTTGTGAGGC TTCATCAATG 
AAACAGAGGT GCTTTTTCAC 
GATACTTGCA AGGTTGCTGA 
AGCAGAGGCT CAATGGGGTT 
GGTTTGGAAC TCACATTCAG 



GGTCTTCTCT GAAAAGAGGA A 500 

TCTAAAAGCT CACATGGCCC 4560 

CTCCAGGGTC CTTCTGCTGC 4620 

TCCATTCTGT GATATTTATT 4680 

GATAGAGTAG TGAAGTAAAC 4740 

ATGAGACAAT AATGGATAGG 4800 

GAGCAAAAAT AAAGTAGGGA 4860 

TAGCATGGCC AGGAAAGCCA 4920 

GCCACTCTGA TCTCTGTCCT 4980 

GCCTTTGGGG GGCTCTGAGC 5040 

GGATGGTGCT GACATGGGGC 5100 

CTGGTTCTGG CTTGGCTGCC 5160 

GCTTGCTTCT GGTGACACTG 5220 

TACCATCTCC GTGGATCGGC 5280 

AGGGTCCCCT TCCCTGACCA 5340 

GAGGGAATGG GAGTCTGGGC 5400 

TGGCATGGCC AGAAGTAATC 5460 

GGGAAATATG GAAGAGGGTC 5520 

GCCACAGGAA GGGATGACAA 5580 

GGATTTCACA CATAGAGAAA 5640 

TCCTCCACTT ACCAGGGGGG 5700 

TACTCAGTAA AACGGGGATG 5760 

AGGTGATGTA TGTGAAGTGT 5820 

ACTTTACACC TTACAAGGTA 5880 

GAGGTAGATG GGGTTATAAT 5940 

GAATGACTTC TCTGAGTTCA 6000 

ACTCTCTGAC TCCAGACTTA 6060 
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GGTTTTTCCG 
GCACAGAGTG 
TTGGGCTGTA 
GGACAAATGC 
CTGAGCCAGT 
GTAGGTGCGT 
CAGGACCTGC 
GTGCAGGGTT 
TTGGGCTCCA 
ACACAGGGAG 
GGGCAGCCTG 
CTCAAAAATT 
AGCCATGTTT 
ATTTTTGAGA 
CTCACTGCAA 
CTGAGATTAC 
GGGTTTCACC 
TCAGCCTCCC 
TTTTTTAAAA 
GTAAGAGTAG 
AAGAAGAATT 
TTTCTGGACC 
CTGGCGGTTC 
CAGAAAAAGA 
TGCATATGTA 
CAGACTTGCC 
TGACAACTAC 



CACCTCCACG 
CTGTGTGTTG 
GTCAGCTGAC 
CCCTCTGAAC 
GCCAGGTCTC 
GGGGATCTGT 
AAACCCAAAA 
TTTGATGTCC 
GTGTCGAGGG 
CAGGGCCCTT 
CCACAAGCTG 
TTTAAAAAAT 
TTGGTGTTTT 
CGGAGTTTCA 
CCTCCGCCTC 
AGGTGCCCAC 
ATGTTGGCCA 
AAAGTGCTGG 
AAGGAAGAAA 
ATTATAAAAA 
CTCTTCTCCC 
CTGGAGCCCC 
TGAGAACTTA 
AAAAAACCAA 
TGTGCATGCA 
TCTTCCTCCC 
GATTTGCTGG 



28/33 

CTGAGGCCAG CCCCAGGGAG 
GGCTCTGTGT GTTGAGGAGT 
AGTCCTTTGT GCTCTGTGGG 
TGTCTTCTGG GCAGTGACAG 
CAAGTGCCTT CTGAATGACC 
TCTGGTCATC TGGATGCTGG 
GCTTATGGGA GCTGGCACGT 
CTGCACTGAC ACAGTTGTCT 
TCAAACAAGG AATTTTGGGG 
TGGCTCAAGC TGATAGTTGC 
GGGCTTTTAC CAAAGAAAAT 
ATTCTGTAAG TCAAAATCCA 
AGTAACCAAT TTCATTTTTT 
CTCTTGTCAC CCAGGCTGGA 
CCGGGTTCAA GCAATTCTCC 
CATCACGCCT GGATAATTTT 
GGATAGTCCT GAACTACTGA 
GATTACAGGC ATGAGCCAGC 
GAAAACCTTA GCCAGAAGAT 
CAAAGTCAGA GCAGTCACTG 
TTCACCCTCC ATGCCCCTTT 
ACCCCAAGCT AAAGACCAGG 
CTTTTCACTT ATTCTGCATT 
GGTAGGTGTG TGGGTAGAGA 
TGTGAAGTGT GCATGTGTGA 
CCTCCTTCCT GAGCTTCTGC 
GGGAAGGCTA CGTGCCAAGC 



TGAGAAGCCC 
CTTGTGACTG 
GATGACGTAG 
TCATGGTCAT 
ACAGGCGATT 
TCATCGGGTG 
CACGTGAGTA 
GCAGTTCTCC 
CGTGGGCGAA 
CGCAGGGATT 
CTCCCTATGT 
TTGTTAGGTC 
TATTATTTAT 
GTGCAATGGC 
TGCCTCAGCC 
TGTATTTTTT 
CCTGAGATAA 
ACGCCCGGCC 
CTTTTTCCTT 
GTGTCTGGGC 
TTGGCTCCAT 
ATACAGGGAA 
TACTGTTTCC 
GCATGAAGTG 
GCTCATATGC 
TGGGGCCGAG 
ACTCTTTTAG 



AAAGTCCGAA 
CCTTGGGGCT 
GCCAATGGGA 
AATCCTGACC 
GGTTTTAGTG 
CAGTATTGAT 
GAGCAGGCAG 
AATTTGACAT 
ATCTGGGAAG 
ACCAGGCCCA 
TAAATGCTTG 
AGTTTGAGAG 
TTATTTGTTT 
ATGATCTCAG 
TCCTGAGTAG 
AGTCGAGATG 
TCCGCCCACC 
ACCAATTTCA 
GCCATATGCA 
ATGGAGGAGA 
GTGATTCAGA 
GCCACAACCA 
TTTTCTTATG 
TGTGTACTCA 
ATCCATGCAC 
CGTGCAGTAA 
GTGCTTTCCA 
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TCAITAATTC CTTCCTCACA ACAGCCCTAT OAGAITAGTA CTATAACTAT CCCCAXTTTC 
ACACC.AGAA AACCTACACA CTTCACTAAC TTGCCCAACC CCACACACCC AOACACCCGC 
AGACCCAGTA CTTACAGCCA CCCAGTCTCG CTCCAGAOTC CCTCTCCTGA ACGACAAGAG 
GCCATCATAC GCCAICACAT TTGCTGCTAC CATTTCTGGI GGTGGCTGGT GGTGAIOGAT 
CCATGAGACG GGTCCTCCAG GTAGTGGTGG TGGCCGAGAC GAOAGCTGAG AGTGGTCAGG 
GAGTAGGAGA TTGGAGGGAG TGTGCTTGCG GTGAGTGCGT GTGTTTTTTT TGGGGGGCAA 
TIAIAAOAGT AIGTAGAAAG TAGGTGGTGI TAITTTTGGG GTTIGAGAGG TGAGATACAG 
TGAAAGAAGT GAAGITCGGG AAGGAAGAGA AGIAATGAGT GGGGAAAATG GAAGTGGAAA 
CGArGTGTCT TTACTGGAAA AGCTCTGTTT GTTCGCGTGT TTCTGTGATG GCACGGCGGX 
AGAGTTGAAG GGGTGTGTTG XCGAGAGGGA GAGTGGGGGG TGGGAGTGTG TGGGTGGCAG 
GCATGGTGGA TGGGGAGAGC ATATGGATCG TAGAGATGGG CGCTGAGAGT GTGAGGTGCA 
TTTGGTGTGC CATGGGCAGA ACGTTGAGGT OGITCAGGAA CAGAGTGGIT AGAAGCGAGA 
CGAAGGCAAC CGGTGTGGOG GGTGGTGGGA GGCAAACGTG GCGAGGGGTT TGCAGCGGTC 
TATGTGGTTG AGGTGTGGTA GATGAGGAGG ATGGAAGGGG AGTGGTGGAT GAGTGGAGGG 
GGGGIGCTIT TGTGGTGGGA GACGGTGTCG GTGGGCAGTT GTTGIGTGGA TTGGAGGCTG 
AATGGGGAGA GGGTTGGGTT GGGGGCGGTT TGGGTACAGG TGGAGGGGAT GGAGAGTGTT 
GGGAGGATGG AGGAGGGGGT GTGGGTGGTT ITCATATGGT TGTCAGTTGG AGGAGGATGT 
CTTGAAAATA TGGGTTGTTT GTGTAGGATG TIAAATGITT TTOGAGTATG ATTTTGGAXT 
CAGTAIGTGA ITTCAIGGGG AGAAGAGGGG TATGAGGAGG GAAAGGAGAT TTTAGGATTA 
AACGATGAGT AAAGTGAGGG GAGAGAGGAT ATTTTTGGTT ITTTITGAGA GAGTCTGAGT 
GTGTGAGGGA GGCTGGAGTG GAGTGGGTTC ATGTTGGGTG AGTGGAAGGT GGAGCTCGCA 
TGTTGAGAGG ATTTTGGTGG GTGAGCGTGG GAAGTAGGIG GGAGTACAGG GAGGGAGGAG 
CAGACGGAGG TAATTTTrTT GIATGTTTAG TAGAGATGGG GTTTGACGGA GTTAGGGAGG 
AIGGTGTTGA TCTGGTGAGG TTGTGATGTG GGTGGTTGGG GGTGGTAAAG TGGTCGGATI 
ACAGGGGTGA AGCGGGGTGG GGGGGCAGAG AGGATATTTG TTAATGAGGG GGAGGGGTGG 
GATTCGAGGG GAGTGTTGTG ATGCGTGAGG GAGTGAGGAT TGGAGTAATG GGTGTGGTTT 
TTGAATGTAA AGTTTCAGGG TTGTAGAGGT TGGTTTGAGG TGGGTGAGTA GTTGGATGGT 
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GATGTGGGGT 
AAAACACCAT 
AGTCTACAAA 
TTTCACGTGC 
TAAAGAACTG 
GAGCTCCCGC 
CCTCAGTGTG 
CCAGCCCATC 
GGAGTCAGAG 
GTCTGGGCAT 
GGATACACAG 
TGTCCCACTG 
ATATCCCAGG 
CTGATCCAGC 
CGGAGGCGCA 
TAGGAGCAGA 
GTTAGATTGG 
TCCAGGCCCC 
GAGCCCCTCT 
TGGCCTTGAT 
GCACCTTTAG 
TGGGGTGGCG 
CTGGAGAGGT 
GCATTTGCCC 
TGGCCCAGAG 
CTTTGTGCCT 
ATTCCGGAAC 



CTGAGGGCCA 
GTTTACTGCA 
ATACTTGACA 
TGAGCTCTCA 
AAGTCACACG 
CTATTCCTTT 
CCTGTTCAGA 
ATCTTCGTTT 
GAGGGCAAAG 
GTGGCATGGG 
GGGCTGGAGG 
ACCTTTTCTT 
ATGGGGGTTC 
CAGGATACAG 
ACTCTTGTCT 
AAAGTGGGGC 
AGTGGGAGCT 
CTCTTCTATC 
TCTATCTGGG 
GACAGGGTGG 
TTGGAATGCT 
GGGAGGAGGC 
GTGAAGAGTC 
GTCCCCAGCT 
GAGCTTGCCT 
CCACAGCCAC 
ATTTTCAAGC 



30/33 

AGAGCTCTGT TCTCATTAAT 
GGAAATTtAA TTGGACAGTG 
ATCACTGCAC TAGATCATGC 
ATACTCTACC ATGAGGAGGG 
GCTTGTCAGT GGCAGAGATA 
CCTCTTCTCA CTGGATAAAG 
CTGTAATCCT CCCTTCCTTC 
CGGACAGAGC AAACAGCAAC 
GCAAAACAAG CCCTGATAAG 
TGGGGTGGCG AGCACGCTAC 
CTTCCCAGGA GTTTGTCTTG 
TCAGCAAGTT CCCCTGAAAT 
CATTCTAGGA GTGGACTGGC 
AGAAGGGGAG GCAAAGGCTG 
CCTGGTGGCC TTGAGCATTT 
TGACTTCAGA AATGGGGTCC 
TAGTGGAGGT GAGCCTTAGA 
CGGGGGCCCC TCTTCTATCC 
GCCTCATGCA GTGGGGCCTA 
CTGGAGGAAT CAGAACGGTC 
CAGGCCTGGG ATGGTGGAGG 
TGTATGGCCG CCATATCTCC 
CCTGAGGCCT CGATGCATCT 
CCTGCTGCCA CCCCCGGCCG 
CACAGGCCTG TGCACCTCTG 
AGCCTGGCAA CTCTGATCAG 
AGATAGCAGG AGATGTGAGT 



CAGAGAAGCT 
TTTCCATCTG 
TGCTTTTAGC 
ATGGAGTGGG 
GAGCTTGAAC 
CTGCTCCAAG 
CTGCCTCCTC 
AAGGAGCTGG 
CAAAAGCAGT 
AGGGGCTTCC 
AACATCTGGA 
TTGGGCTGCT 
AGGCTGAGCC 
AGACAGAACC 
CACAATAGGG 
TCTAGAGCTC 
GGCAAAAGTC 
AGGGCCCCTC 
GGGGAGGTTC 
AGACCTTCTT 
GGGCTCTTGC 
TTTGGCTGGG 
CACTCCAGCT 
TTTTAGGCAC 
ACCCCTGTGA 
GAAAGTGAGG 
ACCTCCAAGC 



TGTGTTTTTA 
GAAAAAAAAA 
ATTCTTAGCA 
TATGAAAAGA 
CGAGGTTGAA 
AGAGGTGCTG 
CCTCCTCTCT 
GTGTGGACCA 
CCCCACAGGT 
TATGCGCTTG 
GGTTTGAATT 
GCTTGGGTGA 
TCCCATGGAG 
AGCTTGAGAG 
GGATAAAGGA 
ACGGGAGGGT 
TCCAGACCAA 
TTCTGTCTGG 
TCTGAGGACT 
TGACCTGCGG 
AGGTGGGGAC 
GGCGTCAGGG 
CACCAGGTCT 
TTGGCTCCCT 
ACCAGTTTTC 
AACAGCAACA 
CCAGGACGCC 
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CACAGGTGCT 
TTGGTAGAGG 
CACAAGCCCT 
TGAGCTGCCC 
GAAAAGCCCC 
GTGGAATTTG 
TCCTCTCCCT 
ACACAGTCGT 
CCCGTTGTCT 
ACACGGGTTC 
GCCGCCCCTT 
CCAGTCAGGC 
ACATCTTGGA 
TAACTGCCCT 
CTTAGGAGAG 
AGGGGAGGGT 
CTCCCTCCAA 
GACAGATGGC 
GGCCTGGCAG 
ATTCAGTGTG 
ACACAGACCA 
GTGCTGAGAA 
GGGGACTAGG 
TCCCAGGTCA 
CTCCACGTCC 
TGACATCATT 
CTGCTGCTTC 



TCCTTCTCTC 
TCACTTTGGA 
TGAGTTTTGG 
ACGGTGGTCC 
AGCTTCCCAT 
TTTTGCAAAG 
TCCTCCTCAG 
GAACAAACGT 
CAAAGCAGCT 
ACACTGGAGT 
CCCGACCCTC 
AAAGGGCCCT 
TACTCGTCTG 
AACCCCTGTG 
CGGCTGCTGG 
TAAATAGTAC 
ATCCAGGGGG 
TCTGGAAAGC 
GTGGGAAGAG 
TGACCTCCAT 
GTCCGGCACC 
GGAAGGGGTG 
TACTAGGGCC 
CAGAGTGTCC 
ACCTCTAACA 
ACCATGCGGT 
GTTAGGCTGG 



31/33 

CTGGATTAAC TGCTCAGATT 
CTTCGGTGGA GCCAGGGGAT 
ACTGCCACGT CTGCTGGGGG 
TGATAGCTGA GGTGCAGTAT 
GACATAATAG CAGCGACAGG 
TGTCCGCGCC AGGAGCTGCT 
GACATGGAGA TCTGTGCAGA 
GAGTTGCTCA AACCAAATGG 
CCTCACTCTT CTCCATCCCC 
CCTGCCGTAG CATGATTGCG 
TGTCATGAGC CCACGGGGGC 
AATTTGTGCC CAGGGAAACT 
AAAGGGGTTG TTAGAGGCGG 
CTTCTCTCAG GCCTGGGATC 
GTTACAGAGT AGGCGCAATC 
AACAGGGCAG TGGGTAGGAC 
ATTTTGCTGT GTGCTGTGTA 
TCAACCTGCA GGAGTTCCAC 
AAAATGAAGC GTGGGAGTCA 
CCTCAAATTT TCTATTGCCA 
ATCAACAGCT ACGAGATGCG 
TCAGGGATGT GGACCCGAGA 
CCACTAGAGA AGGAGAGGGA 
GAGAGGCAGG GAAAATAGAA 
TGGTCCCCTC CACAGGATTC 
AGGCAGACAA ACACATGAAC 
AGGGCATGTT CAGTAAGTGG 



ACCAATTATT 
GTGTGCGTAG 
GCTCAGAGGC 
CTGGCCCCCT 
GATTTTAGAA 
GTACTCGTGA 
TGAGCTCAAG 
GGGTGGGGTG 
CCAGACAAGG 
CTCATGGATG 
CAAGGCAACA 
TAAGGAGACC 
AAGGGGAGGA 
CTGCCCAAGC 
TCTGACTGGT 
AGCCCGGAGT 
GCCCTGACCT 
CACCTCTGGA 
AGAATGGGGT 
GAAAATTTTC 
AAATGCAGTC 
CGGTGGGAGC 
AAGGGCTTCT 
GACAGGCCCA 
CACCTCAACA 
ATCGACTTTG 
GAGAGGGGGG 



TCATTATTGT 
CACACAAATC 
CTTTTTGCTC 
GTCTTCCTCA 
ACACAGCCAG 
ACCATGACCC 
AAGGTCCTTA 
GGTGGGGAGT 
ACCTGAAGAC 
TATCCTTCCT 
TACAGGGTGC 
CTGATTCAGA 
TGTTGGGTTG 
AAAAGTGGTC 
GGTGGAGTGG 
CTCCTAGACC 
CCCTCCTCCA 
ACAAGATTAA 
TGATTTGGAG 
AAACACTATG 
AACGACGCAG 
AGGAATGGGA 
CACTTTCCCT 
AGGCCTCCAG 
ACCAGCTCTA 
ACAGTTTCAT 
CTGCCCTCTG 
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CTCTCTTGCA 
ATCCAGGCTG 
GTTACTGGTG 
GGTCCTCTGA 
CTGTGCATTC 
AGCTCAACGT 
AAGGTGGAGG 
GGTGGAGGGA 
TGCACTTGAG 
CCTTCTTAGG 
CTGGGGGTCT 
GGGGGGGGGG 
GGCTGCAGCT 
ACTCAGGATT 
CTACACCCCT 
TACCCATCCT 
AACAAACCCT 
CGGTTCTGAA 
GGTGGCACTC 
GGGCCAGGAA 
TAGGGCTGGT 
ATCTAAATAA 
ATCAGCCATG 
CAATGAAAAA 
CTTGAGCATA 
TGTTGCTGTA 
CTGAATAATG 



GGGGCAGTTG 
AACAAGGGCC 
ATTCTCTGCC 
GGGGAAGTTA 
TTTCACAGGA 
TCTGGAGGTA 
GGTCAACGGG 
AGGGATAGGA 
GCCCAAAAGG 
GAAGATCTAG 
GACCTGCTGG 
GGGGGTCACT 
CACCATGTAT 
TCAGTTTCAC 
ACAGGCTTCC 
TGATCGGTCA 
TGTCCCTTTG 
GCGAGTGCTC 
AGCACCTCCT 
CCAAACCAGC 
TACTTTGGGC 
AGGCATGTGT 
CATGACTGAA 
CACACACAAA 
AGAATGGCTC 
AAAGACATCA 
GAGTGGAGAT 



32/33 

TGGCAACAGG CATCTCACCT 
AATGACCTCT TTAGGCCCAG 
TGCACATCTT TGTGCTGATG 
CAGTAGTAGA GGCGGAGTGC 
GCTTCTCATG CATTTGACAA 
AAGCATAGGC ACAGCACATT 
GCGGACTGGA CCCAGGGTGT 
ACAGAACATG GAGGGAGGCT 
ACCTCTGCTC CCCCAGTCAC 
GAGAAAGGAA ACAGTAAGCC 
GACTGTTCCC TTTCCTCTTG 
CTTTTCTGAT CTACATTCTG 
GCCTGAACCA GGCTGGCCTC 
CCTCTATTTC CAAAGGCATT 
AGGCACCTCA TCAGTCATGT 
TGCCTAGCCT GACCCTTTAG 
CCATGTGGAG GAAAGTGCCT 
CTGCTTACCT TGCTCTAGGC 
TGTGCTAGAG CCCTCCATCA 
ACTGGGTTCT ACTGCTGTGG 
TGTCCAACTC ATAAGTTTGG 
ATGGCTGGTC CCCTTGTGTT 
TGGCTTCCAA TCATATACTC 
AACAAAATCT TGAATTTTGT 
AGATACTTTC CAAGACATAA 
AGAATAAATG GGGTCATGTA 
TGAGCTATCC TAGCTCCTCT 



GATAATCTCC 
AATGGGATGG 
AGGGACAGCA 
GCCTGTAACT 
GGATGGAGAT 
CCCCCTACAC 
GCTCCTCATT 
CAGCAGGCTC 
TTGATGCGGG 
ACTGCTTCTT 
CCCCGTAAGA 
ATCTTGGGAC 
ATCCAAAGCC 
TACCTCAAAG 
TCGTCCTCCA 
TAAAGCAATG 
GCCTCTGGTC 
TGTCTGCAGA 
CCTTCACGCT 
GGTAAACTAA 
CTGCATTTTG 
TTGTTGTCTC 
ACCTATCACC 
AATCATGCCT 
AAGGAAGGCA 
CAAGGGGAGG 
GCTCACTAAC 



AGTCTGCTCC 
CAAAGGGAGG 
CTGGGCACAC 
GGCCTCTGGC 
GGTATCATCA 
ATTAAAACTC 
TCGACACAGT 
CGAGGACACA 
AAAACATGCA 
GGAAAATCTT 
TTCCTAGGGC 
TTCTTTCAGT 
ATGCAGGATC 
GACCCAGCAG 
TTTTACCCCC 
AGGTAGGAAG 
CGAGCCGCCT 
AGCACCTGCC 
GTCCCACCAT 
CTCAGTGGAA 
AAAAAAGCTG 
ACATTTAGAT 
TACAAGAGAA 
ATTGCTATTT 
GAGGAATAGT 
GGCCGGTTAC 
TGACCTGTCG 
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CATGACCGTG 

GCCTGCGGCA 

GCCAACGTGG 

TTAGTAAAAG 

AAGTCCACAG 

GGTGTCATCT 

GTGCTGGAGC 

TTTCCTCCTT 

AATCTGGGAA 



GACAAAACCC 

TATCTATAGG 

AAAGGGCTGG 

TACTTCATTT 

CTTTATACCA 

GATTTCATTC 

AGCTCTAGGG 

GACCCCCTCC 

ATGTAGTCAC 



33/33 

TGAACGCAGC TGTTTGTTTG 
CATCCTGTGT TTTCCACCCA 
CCGTGAATAT GCAGACAAGG 
TCCTCTTGTA TTTGCTTCAT 
AAATGTAAGA AGGCTATTTG 
TTCTAATCCA TATTCAATAT 
CATATATTTC TCTTAAATAG 
TTTCCCAATT TATTTGGGTC 
CAGG 



CTAAACTTCT 

GTTTCCTTCT 

TAACGAAAGT 

TCTTGCTTCA 

CTTATAAACA 

TAAAAAATCA 

GAGAAAGATT 

ACTACCTTGA 



CTGGACCATG 

TCCTCGCTAA 

AAACCGTCAA 

CAAAGTTACG 

TTTTGAGTCA 

GAAACCAAGG 

TTCAACAGCT 

ATTTAGAGTG 
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